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Abstract 

Deconvolution refers to the challenge of identifying two structured signals given only the sum of 
the two signals and prior information about their structures. A standard example is the problem of 
separating a signal that is sparse with respect to one basis from a signal that is sparse with respect to 
a second basis. Another familiar case is the problem of decomposing an observed matrix into a low- 
rank matrix plus a sparse matrix. This paper describes and analyzes a framework, based on convex 
optimization, for solving these deconvolution problems and many others. 

This work introduces a randomized signal model which ensures that the two structures are inco- 
herent, i.e., generically oriented. For an observation from this model, the calculus of spherical integral 
geometry provides an exact formula that describes when the optimization problem will succeed (or fail) 
to deconvolve the two constituent signals with high probability. This approach identifies a summary 
statistic that reflects the complexity of a particular signal. The difficulty of separating two structured, 
incoherent signals depends only on the total complexity of the two structures. 

Some applications include (i) deconvolving two signals that are sparse in mutually incoherent bases; 
(ii) decoding spread-spectrum transmissions in the presence of impulsive errors; and (iii) removing 
sparse corruptions from a low-rank matrix. In each case, the theoretical analysis of the convex decon- 
volution method closely matches its empirical behavior. 

1 Introduction 

In modern data-intensive science, it is common to observe a superposition of multiple information- 
bearing signals. Deconvolution refers to the challenge of separating out the constituent signals from 
the observation. A fundamental computational question is to understand when a tractable algorithm 
can successfully complete the deconvolution. Problems of this sort arise in fields as diverse as acous- 
tics [AEJ + 12], astronomy [SDC03], communications [BMS06, BSFM07], geophysics [TBM79], image pro- 
cessing [SED05, ESQD05], machine learning [CPW10], and statistics [CLMW11]. Some well-known exam- 
ples of convex methods for deconvolution include morphological component analysis [SDC03], robust 
principal component analysis [CSPW11, CLMW11], and inpainting [ESQD05]. 

This work presents a general framework for deconvolution based on convex optimization. We study 
the geometry of the optimization problem, and we develop conditions that describe precisely when our 
method succeeds. Let us illustrate the major aspects of our approach through a concrete example. 

1.1 A first application: Morphological component analysis 

Starck et al. use deconvolution to model the problem of distinguishing stars from galaxies in an astronom- 
ical image [SDC03]. This task requires hypotheses on the two types of objects. First, we must assume that 
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stars and galaxies exhibit different kinds of structure: stars appear as localized bright points, while galax- 
ies are wispy or filamented. Second, we must insist that the image is not so full of stars, nor of galaxies, 
that they obscure one another. These two properties are modeled by the notions of incoherence and spar- 
sity. We solve the deconvolution problem using a method known as morphological component analysis 
(MCA) [SDC03, SED05, ESQD05, BMS06]. 

1.1.1 The MCA signal model 

Suppose that we observe a superposition of two structured signals: 

zo = Ax + By eU d . 

We assume that the matrices A and B are known, while the vectors Xq and yo are unknown. Each column 
of A contains an elementary structure that might appear in the first signal; the columns of B reflect the 
structures in the second signal. The vector xq selects which columns of A appear in the first signal, e.g., 
stars in different locations, while yo generates the second signal, for example, galaxies in different loca- 
tions. Incoherence demands that the columns of A and B are weakly correlated, and sparsity requires that 
Xq and yo have few nonzero elements. 

For simplicity, we assume that A and B are orthonormal bases. By changing coordinates, we may take 
A -I, the identity matrix. The observation then has the form 

z = x + Qy 

for a known orthogonal matrix Q. The specialization to orthonormal bases is standard [DH01, ESQD05, 
SED05, HB12]. 

Instead of restricting our attention to specific choices of Q that are incoherent with the identity matrix, 
we consider an idealized model for incoherence where Q is a uniformly random orthogonal matrix. This 
formulation ensures that the structures in the two signals are oriented generically with respect to each 
other. Other authors have also used this approach to study incoherence [DH01, ESQD05] . 

We quantify the sparsity of the two constituent signals by fixing parameters t x and T y in the interval 
[0, 1] such that the unknown signals x and y satisfy 

nnzOo) = \T x d] and nnz(yo) = \r y d], 

where nnz(jt) denotes the number of nonzero elements of x. In other words, t x and T y measure the 
proportion of nonzero entries in x and y . These sparsity parameters emerge as the major factor that 
determines how hard it is to extract xq and y from the observation z . 

1.1.2 The constrained MCA deconvolution procedure 

The goal of morphological component analysis is to identify the pair (Xo,yo) of sparse vectors given the 
observation z and the basis Q. A natural technique for finding a sparse vector that satisfies certain con- 
ditions is to minimize the £\ norm subject to these constraints [CDS99], where the £\ norm is defined as 
11*11* :=E? =1 l*il. 

Assume that we have access to side information a = \\ yo II e 1 . Then the intuition above leads us to frame 
the following convex optimization problem for deconvolution: 

minimize ||jc|U, 

(1.1) 

subject to Hyll^j < a and x+ Qy- z , 

where the decision variables are x,y eU d . We call this optimization problem constrained MCA, and we 
say that it succeeds if {x ,yo) is the unique optimal point of (1.1). Since (1.1) can be written as a linear 
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Phase transition for success of constrained MCA (d— 1 00) 




T m : Sparsity ratio for x§ 



Figure 1 Performance of constrained MCA. The variables t x and r y on the axes represent the fraction of com- 
ponents in xq and yo that are nonzero. The background shading indicates the empirical probability that the 
constrained MCA problem (1.1) identifies the pair (jeo,yo) from the observation zq = xq + Qyo, where Q is a ran- 
dom basis. The yellow curve marks the empirical 50% success threshold. The green curve locates the theoretical 
phase transition for deconvolving a single pair (xo,yo). F° r sparsity levels below the blue curve, constrained 
MCA (1.1) provably recovers all (r x ,T y ) -sparse pairs with high probability in the dimension. Further details are 
available in Section 1.1.3 and Section 6.1. 

program, constrained MCA offers a tractable procedure for attempting to identify (xq, yo), provided the 
observation zq, the basis Q, and the side information a = || yo Un- 
constrained MCA is closely related to the standard MCA procedure, which is a Lagrangian formulation 
of (1.1) that does not require the side information a [SED05, Eq. (4)]. The constrained problem (1.1) 
is more powerful than the standard MCA procedure, so it provides hard limits on the effectiveness of the 
usual approach. In most cases, the two methods are equivalent, provided that we can choose the Lagrange 
multiplier correctly — a nontrivial task in itself. See Section 1.2.3 for more details. 

1.1.3 Numerical and theoretical results for constrained MCA 

Figure 1 displays the result of a numerical experiment on constrained MCA. We fix the dimension d - 100. 
For sparsity levels {t x , x y ) varying over the unit square [0, l] 2 , we form vectors xq and yo with sparsity levels 
nnz(xo) = \T x d] and nnz(yo) = \T y d] . (The manner in which we choose the nonzero entries is irrelevant.) 
We draw a random basis Q and construct the observation zq - xq + Qyo. Then we solve the constrained 
MCA problem (1.1) to identify the pair {xq, yo). The background of the figure shows the empirical proba- 
bility of success over the randomness in Q; dark areas denote low probability of success, while light areas 
denote high success rates. The yellow curve marks the 50% success threshold. 
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Figure 2 Atomic gauge, (a) Let si be an atomic set consisting of five atoms (stars) . The "unit ball" of the atomic 
gauge is the closed convex hull of si (heavy line). Other level sets (dashed lines) of are dilations of the 
unit ball, (b) At an atom (star), the unit ball of tends to have sharp corners. Most perturbations away from 
this atom increase the value of f^, so the atomic gauge is an effective measure of the complexity of an atomic 
signal. 

This work establishes two theoretical results for constrained MCA. The first result provides a phase 
transition curve, parameterized by the sparsity {t x , T y ) , for the probability that constrained MCA will de- 
convolve a single pair (xo,yo) from the associated observation zq. This weak bound is marked by the green 
line in Figure 1 . Observe that the green line coincides almost perfectly with the empirical phase transition. 

Second, we establish a strong bound. For a fixed instantiation of the random basis Q, with high prob- 
ability, constrained MCA (1.1) can identify every sufficiently sparse pair (xo,yo) from the associated ob- 
servation Zq. The blue curve in the bottom left corner of Figure 1 is a lower estimate for the sparsity pairs 
(TjoTy) where this uniform guarantee holds. Section 6.1 provides the details regarding the computation 
of the weak and strong bounds as well as a fully detailed description of our numerical experiment. 

1.2 A recipe for deconvolution 

This work is not primarily about MCA. We are interested in developing methods that apply to a whole 
spectrum of deconvolution problems. The following two sections describe how to construct a convex 
program that can separate two structured signals. 

1.2.1 Structured signals and atomic gauges 

The (\ norm is a convex complexity measure that tends to be small near sparse vectors, so we can mini- 
mize the £ i norm to promote sparsity. We now describe a method for building complexity measures that 
are appropriate for other types of structure. This construction was originally introduced in the nonlinear 
approximation literature [DT96, Tem03] . The recent paper [CRPW10] explains how to apply these ideas to 
solve signal processing problems. 

In practice, we often encounter signals that are formed as a positive linear combination of a few ele- 
mentary structures, called atoms, drawn from a fixed collection. For example, a sparse vector in U is a 
conic combination of a small number elements from the set {±gj : i = 1, . . ., d} of signed basis vectors. We 
want to construct a function that reflects the complexity of an atomic signal. 
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We define the atomic gauge of a vector x e R d with respect to a set sd c U d of atoms by 

fsi (*) := inf {A > : x e A • conv(,s/)}, 

with the convention that (x) = +00 if the set is empty. Then is a homogeneous convex function. The 
"unit ball" of is convGs/), and the level sets of are dilations of this unit ball. The atomic gauge fa is 
a norm if and only if conv(.e/) is a bounded, symmetric set that contains zero in its interior. 

The convex hull of an atomic set si tends to have sharp corners at atoms; see Figure 2. At these sharp 
points, most perturbations of the objective increase the value of the gauge, so the atomic gauge tends to 
take small values at atoms. Similar behavior occurs at a signal comprised of a relatively small number of 
atoms. This observation is a key reason that atomic gauges make good complexity measures for atomic 
signals [CRPW10]. 

Some common atomic gauges include 

• The(\ norm. The £\ norm on IR rf is the atomic gauge generated by the set sd = {+e,- : i - l,...,d] 
of signed standard basis vectors. This norm is widely used to promote sparsity [CDS99, CRT06, 
Don06a]. The (\ norm may also be defined for matrices via the formula — Y.i,j In this 
context, the t\ norm reflects the sparsity of matrices [CSPW09, CPW10, CSPW11]. 

• The norm. The norm on U d , given by II xll^ — max, = i . ^ |x;|, is the atomic gauge generated 
by the set sd - {+ \} d c U d of all 2 d sign vectors. We use this norm to deconvolve binary codewords. 
See also [DTlOa, CRPW10, MRU]. For matrices, the £ x norm returns \\X\\ £oo = max;j |X,- ; -|; this 
function is the atomic gauge generated by the set of sign matrices. 

• The Schatten 1-norm. The Schatten 1-norm on R mx " is the sum of the singular values of a matrix. 
It is the atomic gauge generated by the set of rank-one matrices in R mx n with unit Frobenius norm. 
Minimizing the Schatten 1-norm promotes low rank [Faz02, RFP10] . 

• The operator norm. The operator norm returns the maximum singular value of a matrix. On the 
space U. nxn of square matrices, the operator norm is the atomic gauge generated by the set O n of 
orthogonal matrices. This norm can be used to search for orthogonal matrices [CRPW10, Prop. 3.13] 

Our applications focus on these four instances, but a dizzying variety other atomic gauges are available. 
We refer to [CRPW10] for further examples. 

1.2.2 Formulating a convex deconvolution method 

We are ready to introduce a computational framework for deconvolving structured signals. This approach 
unifies several related procedures that appear in the literature. See, for example, [JRSR10, CSPW11]. 
Assume we observe the superposition of two structured signals: 

zq = x + Uy , 

where U is a known orthogonal matrix and the pair (xo,yo) is unknown. We include the basis U in the 
formalism because it allows us to model incoherence. Our goal is to deconvolve the pair (x ,yo) from the 
observation Zq. 

Let / and g be convex complexity measures — such as atomic gauges — associated with the structures 
we expect to find in x and y . Suppose we have access to the additional side information a = g{yo). We 
combine these ingredients to reach the following convex deconvolution method: 

minimize f{x) 

(1.2) 

subject to g{y) < a and x+Uy- z , 

where the decision variables are x,y e U d . The display (1.2) describes a convex program because / and 
g are convex functions. We say that the convex deconvolution method (1.2) succeeds at deconvolving 
[xo,yo), or simply succeeds, if (xq, yo) is the unique optimal point of (1.2); otherwise, it fails. 
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In this work, we develop conditions that describe when the convex deconvolution method (1.2) suc- 
ceeds and when it fails. Lemma 2.3 provides a deterministic characterization of the success and failure 
of this method. The spherical kinematic formula, Fact 3.5, provides an exact expression for the proba- 
bility that method (1.2) succeeds under a random model for incoherence. The main results of this work, 
Theorems 4.3 and 4.4, show that two geometric parameters characterize success and failure of our convex 
deconvolution method under this random incoherence model. Section 5 describes how to evaluate these 
parameters in many interesting cases. 

1.2.3 The Lagrangian counterpart 

In practice, the value a = g(yo) ma Y not De known, so we might prefer to replace the convex deconvolution 
method (1.2) with its Lagrangian relative 



where A > is a regularization parameter that must be specified. The constrained problem (1.2) is slightly 
more powerful than (1.3), so its performance dominates the Lagrangian formulation (1.3). It is well known 
that (1.2) and (1.3) are equivalent when the regularization parameter A is chosen correctly and a mild reg- 
ularity condition holds; see Appendix A for details. Nevertheless, it is nontrivial to determine the Lagrange 
multiplier A. Cross validation [BDB07] offers a principled approach for determining A, but it may be costly 
to implement. 

1 .3 One hammer, many nails 

The convex deconvolution method (1.2) includes many interesting special cases. Our analysis provides 
detailed information about when (1.2) is able to separate two structured, incoherent signals. We now 
describe some applications of this machinery. 

1.3.1 A secure communications protocol that is robust to sparse errors 

Suppose we wish to securely transmit a binary message across a communications channel. We can ob- 
tain strong guarantees of security by modulating the message with a random rotation before transmis- 
sion [Wyn79a, Wyn79b] . Our theory shows that decoding the message via deconvolution also makes this 
secure scheme perfectly robust to sparse corruptions such as erasures or malicious interference. 

Consider the following simple communications protocol. We model the binary message as a sign 
vector m e {+\} d . Choose a random basis Q e d . The transmitter sends the scrambled message s = 
QniQ across the channel, where it is corrupted by an unknown sparse vector Cq e U d . The receiver must 
determine the original message given only the corrupted signal 



and knowledge of the scrambling matrix Q. 

This signal model is perfectly suited to the deconvolution recipe of Section 1.2. The discussion in Sec- 
tion 1.2.1 indicates that the (\ and norms are natural complexity measures for the structured signals 
Co and niQ. Since the message mo is a sign vector, we also have the side information HfMoll^ = 1. Our 
receiver then recovers the message with the convex deconvolution method 



where the decision variables are c,me R d . This method succeeds if (cq, jmq) is the unique optimal point 



minimize / (x) + A • g (y) 
subject to x+Uy-ZQ, 



(1.3) 



zo = so + Co = Qm + c 



minimize 



subject to 




(1.4) 



of (1.4). 
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Phase transition for channel coding 
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Figure 3 Performance of the robust communication protocol. The heavy curves indicate the empirical proba- 
bility that problem (1.4) decodes a d-bit message mo e from the observation zq = Qmo + cq, where Q is 
a random basis. The variable t on the horizontal axis measures the proportion of nonzero entries in the cor- 
ruption Co. A benign corruption is independent of Q, while an adversarial erasure zeros out the [rd] largest 
components of the transmitted message. The vertical line at t =; 0.19 marks the theoretical phase transition 
for successful decoding under a benign corruption. The vertical line at t = 0.018 provides a uniform guarantee 
applicable to adversarial corruptions. For t < 0.018, with high probability, our protocol will decode a message 
subject to any T-sparse corruption whatsoever. See Section 1.3.1 and Section 6.2 for more details. 



In Section 6.2, we apply the general theory developed in this work to study this communications pro- 
tocol. Before summarizing the results of this analysis, we fix some notation. Suppose the corruption 
Co e U d is T-sparse; that is, nnz(co) = ]rd] for some t e [0, 1]. We further distinguish between two types 
of corruption. A benign corruption cq is independent of the scrambling matrix Q. In contrast, an adver- 
sarial corruption may depend on both Q and mo. Adversarial corruptions also include nonlinear effects 
that are not necessarily malicious. For example, we can model an erasure at the ith time instant by taking 
{c )i = -(Qmo);. 

Figure 3 presents the results of a numerical experiment on this communications protocol; the com- 
plete experimental procedure is detailed in Section 6.2.3. Briefly, we consider messages of length d = 100 
and d = 300, and we let the sparsity t range over the interval [0, 0.35]. We test the benign case by adding 
a T-sparse corruption that is independent from Q. We also consider a particular adversarial corruption in 
which we set the [rd] largest-magnitude entries in the transmitted message sq to zero. The curves indicate 
the empirical probability that the protocol succeeds as a function of t. 

In the benign case, our theory shows that there exists a phase transition in the success probability of 
the convex deconvolution method (1.4) at sparsity level t a 0.19. The empirical 50% failure threshold for 
benign corruptions closely matches this prediction. In the adversarial case, our results guarantee that 
with high probability, our protocol will tolerate all corruptions that affect no more than 1 .8% of the com- 
ponents in the received message zq. This bound is conservative for the type of adversarial corruption in 
the numerical experiment; this is not surprising because we may not have constructed the worst possible 
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Bounds for success of low-rank and sparse deconvolution 




p: Ratio of rank to edge length of Xq (p = r/n) 



Figure 4 Low-rank matrix recovery with sparse corruptions. The horizontal axis is the normalized rank p = 
rankCXbJ/n, and the vertical axis is the sparsity level t = nnz(Fo)/n 2 . The intensity of the background denotes 
the empirical probability that (1.5) recovers [Xq, Yq) from the observation Zq = Xq + 22. (Fo). In the region below 
the green curve, the convex deconvolution method (1.5) recovers a low-rank matrix Xq from a randomly rotated 
sparse corruption i2(Fo) with high probability. See Section 1.3.2 and Section 6.3 for more details. 



corruption. 

1.3.2 Low- rank matrix recovery with generic sparse corruptions 

Consider now the matrix observation Zq = Xq + SHYq) e U nxn , where Xq has low rank, Fo is sparse, and 
Si is a random rotation on U nxn . This type of signal provides a stylized model for applications such as 
latent variable selection [CPW10, CSPW11] and robust principal component analysis [CLMW11]. In these 
settings, Xq has low rank because the underlying data is drawn from a low- dimensional linear model, 
while S1{Yq) represents a corruption that is sparse in a random basis. We aim to discover the matrix Xq 
given the corrupted observation Zq and the basis Si. 

We follow the now-familiar pattern of Section 1.2. The Schatten 1-norm ||-|| Sl serves as a natural com- 
plexity measure for the low- rank structure of Xq, and the matrix (\ norm \\-\\( 1 is appropriate for the sparse 
structure of Fo. We further assume the side information a = || Fo || ( x . We then solve 

minimize II .XII c, 

(1.5) 

subject to lin^j < a and X + 3,(Y) = Zq. 

This convex deconvolution method succeeds if (Xo, Fo) is the unique solution to (1.5). 

Figure 4 displays the results of a numerical experiment on this approach to rank-sparsity decon- 
volution. We take the matrix side length n — 35 and draw a random rotation SI for U nxn . For pa- 
rameters < p,T < 1, we generate matrices Xq and Fo such that rank(Xo) = [pn] and nnz(Fo) = [in 2 ]. 
The background shading indicates the empirical probability that (1.5) succeeds given the observation 
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Zo - X +i2(Fo). We mark the empirical 50% success probability with a yellow curve. See Section 6.3.1 for 
the experimental details. 

The results of this work show that, with high probability, program (1.5) succeeds so long as the pair 
(p,r) lies below the green curve on Figure 4. When the rank parameter p is small, our theoretical bound 
closely tracks the phase transition visible in the numerical experiment, although the bound appears loose 
when p is larger. We refer the reader to Section 6.3 for more information. 

1.3.3 Matrix deconvolution mix-and-match 

Our results are not restricted to the convex deconvolution methods (1.1), (1.4) or (1.5). Let us mention 
a few other situations we can analyze using the theory developed in this work. With a tractable convex 
program, it is possible to deconvolve 

• An orthogonal matrix from a matrix that is sparse in a random basis, 

• A randomly oriented sign matrix from a sufficiently low- rank matrix, and 

• A randomly rotated low- rank matrix from an orthogonal matrix. 

See Section 6.4 for the details. 

1.4 Theoretical insights 

Our approach reveals a number of theoretical insights. 

Design of convex deconvolution methods for incoherent structures. In the incoherent regime we ana- 
lyze, the parameters that determine when the convex deconvolution method (1.2) succeeds reflect 
the structures in the constituent signals and the associated complexity measures. These summary 
parameters are independent of the relationship between the two incoherent structures. We discuss 
this fact and its consequences for the design of deconvolution procedures in Section 4.2.1. 

Connection with linear inverse problems. The parameters that determine success of the convex decon- 
volution method (1.2) are closely related to number of random linear measurements required to 
identify a structured signal. In Section 5.2, we leverage this relationship to compute these parame- 
ters from Gaussian width bounds developed in [CRPW10]. 

Phase transitions. Our theory indicates that there is often a phase transition in the behavior of the decon- 
volution method (1.2). This phenomenon is related to a conjecture in spherical integral geometry; 
see Section 4.2.2 for a discussion of this point. 

1.5 Outline 

This work begins with deconvolution in the deterministic setting. Section 2 describes the geometry of the 
convex deconvolution method (1.2) and provides a geometric characterization of successful deconvolu- 
tion. 

Section 3 presents a random model for incoherence along with some techniques from spherical in- 
tegral geometry that allow us to analyze this model. In Section 4, these ideas yield theory that predicts 
success and failure regimes for the convex deconvolution method (1.2). Section 5 develops methods for 
computing the parameters necessary to apply the theorems of Section 4. 

In Section 6, we analyze the application problems described in Sections 1.1 and 1.3. Section 7 con- 
cludes with a discussion of this work's place in the literature and future directions. 
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1.6 Notation and conventions 

All variables are real valued. Bold lowercase letters represent vectors, and bold capital letters are matrices. 
The j'th element of a vector is written %i or {x)i, while the U',j)th element of a matrix is X,j or (X)ij. We 
express the transpose of X as X* . 

The symbol ||-||^ stands for the £ p vector norm on U d , defined by \\x\\^ := £f =1 |x;| p when 1 < p < oo 
and IIjcII^ := max ; -i j ^ when p - oo. The £ p norm of a matrix treats the matrix as a vector and 
applies the corresponding vector ( p norm. The Schatten 1-norm ||X|| Sl is the sum of the singular values 
of a matrix X, while the operator, or spectral, norm ||X||o p returns the maximum singular value of X. 

We reserve the symbols / and g for convex complexity measures. A convex function may take the 
value +00, but we assume that all convex functions are proper; that is, each convex function takes on at 
least one finite value and never takes the value -00. 

We deal frequently with convex cones, which are positively homogeneous convex sets. For any cone 
K, we define the polar cone 

K° := {y : (y, x)<0 for all xeK}. 

A polyhedral cone is the intersection of a finite number of closed halfspaces, each containing the origin. 
One important polyhedral cone is the nonnegative orthant, defined by 

IR+ := {x e R d : x t > 0, i = l,...,d}. 

For a set A c U d , we write conv{A) for its convex hull and A for its closure. 

The Euclidean unit sphere in R d is the set S d_1 . The orthogonal group — the set of d x d orthogonal 
matrices — is denoted 0^- The subset of 0^ with determinant one (the special orthogonal group) is SO^. 
The term basis refers to a member of 0^. In the sequel, the letter U will always refer to a basis. 

The symbol P denotes the probability of an event. Gaussian vectors and matrices have independent 
standard normal entries. We define a random basis as a matrix drawn from the Haar measure on Od, and 
we reserve the letter Q for a random basis. 

A special note is in order when our observations are matrices. The space of matrices R mxn is 
equipped with a natural isomorphism to U mn through the vec(-) operator, which stacks the columns of 
a matrix on top of one another to form a very tall vector. We define a random basis £1 for R mx " by 
Q{Y) = vec _1 (Qvec(F)), where Q is a random basis for R mn . It is easily verified that such a SI is a random 
basis for U mx n equipped with the Euclidean structure induced by the Frobenius norm. 

2 The geometry of deconvolution 

This short section lays the geometric foundation for the rest of this work. Section 2.1 describes the local 
behavior of convex functions in terms of special convex cones. In Section 2.2, this geometric view yields a 
concise characterization of successful deconvolution in terms of the configuration of two cones. 

2.1 Feasible cones 

The success of the convex deconvolution method (1.2) depends on the properties of the complexity mea- 
sures / and g at the structured vectors xq and y . The following definition captures the local behavior of 
a convex function. 

Definition 2. 1 (Feasible cone) . The feasible cone of a convex function / at a point x is defined as the cone 
of directions at which / is locally nondecreasing about x: 

&{f,x):= \J{S:f(.x + A8)<f(x)}. 

A>0 
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Figure 5 Geometry of deconvolution. (a) Deconvolution succeeds: Every feasible perturbation about xq (gray 
area) increases the objective function (green level lines), (b) Deconvolution fails: Some feasible perturbations 
decrease the objective value. In each panel, the success or failure of the convex deconvolution method (1.2) is 
determined by a configuration of two cones. This fact forms the content of Lemma 2.3. 

The feasible cone is always a convex cone containing zero, but it is not necessarily closed. If s£ is a 
set of atoms and a e s>l , the feasible cone &{fa,a) of the atomic gauge fa at the atom a tends to be 
small because the unit ball of fa is the smallest convex set containing all of the atoms. See Figure 2 for an 
illustration. The positive homogeneity of atomic gauges further implies that the feasible cones of atomic 
gauges do not depend on the scaling of a vector, in the sense that 

&{fa,x)=&{fa,\-x) 

for all xeR d and any A > 0. 

Remark 2.2. Definition 2.1 is equivalent to the definition of the "tangent cone" appearing in [CRPW10, 
Eq. (8)] . However, that definition differs slightly from the standard definition of a tangent cone; cf., [RW98, 
Thm. 6.9]. The cone of feasible directions [PatOO, p. 33] is the closest relative of the feasible cone that we 
have identified in the literature, and this is the source of our terminology. 

2.2 A geometric characterization of optimality 

The following lemma provides a geometric characterization for success in the convex deconvolution 
method (1.2) in terms of the configuration of two feasible cones. This is the main result of this section. 

Lemma 2.3. Program (1.2) succeeds at deconvolving [xq, yo) ifandonlyif&{f,xo)C~](-U&{g,yo)) = {0}. 

In words, the deconvolution method (1.2) succeeds if and only if the two feasible cones are rotated 
so that they intersect trivially. Note that many bases U that satisfy this condition when the two feasible 
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cones are small. This observation provides further support for choosing atomic gauges as our complexity 
measures. We illustrate Lemma 2.3 in Figure 5. 

The proof of Lemma 2.3 requires two technical propositions. The first is an alternative characteriza- 
tion of feasible cones. 

Proposition 2.4. Let f be a convex function. Then 8 e &{f,x) if and only if there is a number A > such 
that, for all A e [0,A ], we have f [x + X8) < f [x) . 

Proof of Proposition 2.4. The "if" part is immediate: given any such A , the assumption f{x + X 8) < f(x) 
implies 8e&{f, x) by the definition of feasible cones. 

For the other direction, suppose 8e&{f,x). Then the definition of a feasible cone ensures that there 
exists a number Ao > for which f{x + A 8) < f{x). Suppose A e [0, AoL By the convexity of /, we have 

f{x+\8) = /((i-A) x+ A (x+Ao «)) 

< (i-_)/( X ) + — /(x+Aofl 

Applying the inequality f(x+ A 5) < fix), we find that the last expression above is less than f{x). This is 
the result. □ 

Our second technical proposition is a change of variables formula for the feasible cone under a non- 
degenerate affine transformation. 

Proposition 2.5. Letg be any convex function, and define h{x) := g[A~ 1 {z- x)) for some invertible matrix 
A. Then&{h,x) = -A&{g,A- l {z-x)). 

The proof of Proposition 2.5 follows directly from the definition of a feasible cone. We omit the details. 

Proof of Lemma 2.3. Because U is unitary, we may eliminate the variable y in (1.2) via the equality con- 
straint y - U* (z - x). Therefore, [xo,y ) is the unique optimum of (1.2) if and only if x is the unique 
optimum of 

minimize f(x) 

(2.1) 

subject to g{U* (z - x)) < g[U* (z - x )) = g{y ), 

with decision variable x. The equality in (2.1) follows from the definition z - x + Uyo- The origi- 
nal claim thus reduces to the statement that x is the unique optimum of (2.1) if and only if &{f,Xo) n 
{-U&(g,yo)) = {0}. The rest of the proof is devoted to this claim. 

(<=) Suppose ^{f,xo)n[-U^{g, yo)) = {0}. We show that xo is the unique optimum of (2.1) by verifying 
that the strict inequality f(x) > f(xo) holds for any feasible point x of (2.1) with x ^ jco- 

To this end, assume x is feasible for (2.1) and x ^ xq. Feasibility of x is equivalent to 

g{U* (zo - x)) < g{U* (zo - x )) = g(y ). 

By definition of the feasible cone and the transformation rule of Proposition 2.5, the inequality above 
implies 

x- x e &(g{U* (z - •)),*)) = -U&[g, U* (z - xq)) = -U&{g,y ); 

The final equality above follows from zq- xq + Uyo- 

Since x^ xq, the assumption &(f,Xo) n {-U^{g,yo)) = {0} implies x-x t &[f,xo). By the definition 
of feasible cones, we must have f{x) = f[xo + {x - xq)) > f(Xo). We have deduced f{x) > /(jco) for every 
feasible x ^ xq, and so conclude that Xq is the unique optimum of (2.1). 

(=>) Suppose x is the unique optimum of (2.1). Let 8 be some vector in the intersection ^(/,x ) n 
(-U&(g,y )). We must show 8 = 0. 
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As 8 e &{f, xq), Proposition 2.4 implies that f[xg) > f(xo + Atf) for all sufficiently small A > 0. Applying 
Proposition 2.4 to the fact -U*8 e &(.g,yo) yields 

g(yo) £ g(yo - W*8) = g(lT (zo - (xb + A«))) 

for all sufficiently small A > 0. We have used the relation z = x + Uyo again here. In summary, for some 
small enough A > 0, the perturbed point xq + AS is feasible for (2.1), and its objective value is no larger 
than f{x ). In other words, x + AS is an optimal point of (2.1). But x is the unique optimal point of (2.1) 
by assumption, so we must have 6 = 0. This is the claim. □ 

3 The random model and spherical integral geometry 

We now set the stage for a probabilistic analysis of the geometric optimality condition given by Lemma 2.3. 
In Section 3.1, we introduce a random model for incoherence. Section 3.2 provides background in spher- 
ical integral geometry, a subfield of integral geometry that studies random configurations of cones and 
quantities related to these configurations [SW08, Section 6.5] . This theory provides an exact expression, 
called the spherical kinematic formula, for the probability that the convex deconvolution method (1.2) 
succeeds under our random model. 

The quantities involved in the spherical kinematic formula are typically difficult to compute. To ease 
this burden, Section 3.3 defines geometric summary parameters that greatly simplify the application of 
the spherical kinematic formula. In Section 4, we exploit these ideas to develop our main results. 

To illustrate concepts from integral geometry, we use subspaces and the orthant as running exam- 
ples. These are not simply toy examples. A subspace plays an important role in the relationship between 
deconvolution and linear inverse problems in Section 5, while the orthant appears as a feasible cone in 
applications considered in Section 6. 

3. 1 Incoherence and the random basis model 

Deconvolution is hopeless when the structures in the constituent signals are too strongly aligned. As an 
extreme example, suppose we observe zq = xq + yo, where both xo and yo are sparse. There is clearly no 
principled way to assign the nonzero elements of Zo correctly to xq and y . In contrast, if we observe 
zo = xq + Hyo, where H is a normalized Walsh-Hadamard transform and both xq and yo are sparse, then 
the pair (x ,yo) is typically identifiable [Tro08]. The latter situation is more favorable than the former 
because the Walsh-Hadamard matrix and the identity matrix are incoherent; that is, their columns are 
weakly correlated. 

We prefer to avoid restricting our attention to special cases, such as the Walsh-Hadamard matrix. 
Therefore, we model incoherence by assuming that the basis U = Q, where Q is drawn randomly from 
the invariant Haar measure on 0^. We call this the random basis model. This idealized approach to 
incoherence guarantees that the structures in the two constituent signals are generically oriented. We 
expect that our results also shed light on highly incoherent problems, such as the case where U = H. More 
coherent situations fall outside our purview. 

3.2 Background from spherical integral geometry 

When we couple the random basis model with Lemma 2.3, we see that the optimality condition for the 
convex deconvolution method (1.2) boils down to a geometric question: When does a randomly oriented 
cone strike a fixed cone? This section describes results from spherical integral geometry that provide an 
exact formula for this probability. 

In order to state this formula, we must introduce parameters known as spherical intrinsic volumes. 
Spherical intrinsic volumes quantify geometric properties of convex cones such as the fraction of space 
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a cone consumes (a type of volume), the fraction of space taken by the corresponding dual cone, and 
quantities akin to surface area. The following characterization [Amell, Proposition 4.4.6] is convenient. 

Definition 3.1 (Spherical intrinsic volumes). Let K c U be a polyhedral convex cone, and define the 
Euclidean projection onto K by 



where the vector co is drawn from the standard Gaussian distribution on R . 

The definition of spherical intrinsic volumes extends to all closed convex cones by approximation 
with polyhedral cones, but note that the probabilistic characterization above does not hold for non- 
polyhedral cones. The technical details involve continuity properties of the spherical intrinsic volumes 
under the spherical Hausdorff metric. This theory is developed in [Gla95, Gla96] . See [SW08, Ch. 6.5] 
for a self-contained overview of spherical integral geometry developed via polyhedral approximation, or 
see [Amell, Sec. 4] for a development using tools from differential geometry. 

The probabilistic definition of spherical intrinsic volumes is valuable in part because it makes the 
following facts nearly obvious. 

Fact 3.2. Let K be a closed convex cone. Then 

1 . (Positivity) vi {K) >0for each i = - 1, 0, . . . , d - 1, 

2. (Unit sum) T.f~\ v^K) = 1, and 

3. (Basis invariance) For any basis UeO d and index i = - 1, 0, . . . , d - 1, we have Vi {UK) = v\ {K) . 

Proof sketch. First, assume K is a polyhedral cone. The positivity of spherical intrinsic volumes follows 
from the positivity of probability. The unit sum rule follows immediately from the fact that the projection 
U K [x) lies in the relative interior of a unique face of K. Finally, the basis invariance of spherical intrinsic 
volumes is immediate from the corresponding invariance of the Gaussian distribution. Continuity of the 
spherical intrinsic volumes under the spherical Hausdorff metric implies that these facts must hold for all 
closed convex cones by approximation with polyhedral cones. □ 

3.2.1 Two examples 

Although computing spherical intrinsic volumes is often challenging, a direct application of Definition 3 . 1 
can bear fruit in some situations. A subspace provides the simplest such example. 

Proposition 3.3 (Spherical intrinsic volumes of a subspace). Suppose L c R d is a linear subspace of di- 
mension n. Then V{{L) = <*>,•,„_!, where8 ir j is the Kronecker 8 function. 

Proof. A subspace L of dimension n is a polyhedral cone with a single face, with dimension n. The pro- 
jection of any point onto L lies in the relative interior of this face. The claim follows immediately from 
Definition 3.1. □ 

The computation of the spherical intrinsic volumes for the nonnegative orthant requires only some 
elementary probability theory. We illustrate the following result in Figure 6. 

Proposition 3.4 (Spherical intrinsic volumes of the orthant [Amel 1, Example 4.4.7]). The spherical intrin- 
sic volumes of the nonnegative orthant R+ are given by the binomial sequence 



njr(x):=argmin||x-y||/ 2 . 
For i = -1, 0, . . . , d - 1, we define the ith spherical intrinsic volume ofK by 




U K {co) lies in the relative interior 
of an (i + 1) -dimensional face of K 




d 




(3.1) 



fori = -l,0,...,d-l. 
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Intrinsic volumes of the nonnegative orthant 
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Figure 6 Spherical intrinsic volumes of the orthant. The spherical intrinsic volumes of the nonnegative orthant 
are given by scaled binomial coefficients, as stated in (3.1). The scale on the horizontal axis is the normalized 
index = [i + 1)1 d. 



Proof. The Euclidean projection onto the nonnegative orthant is given by the componentwise threshold 
operation 

n R d (a)) J. = max{ftij-,0}. 

Therefore, the projection Yl R d (<*>) lies in the relative interior of an [i + 1) -dimensional face of if and only 
if oi has exactly i + 1 strictly positive values. 

When to is drawn from a standard Gaussian distribution, the number of positive entries is distributed 
as a binomial random variable. Hence, U R d (w) lies in an (i + 1) -dimensional face of uf with probability 
2 _d ( ;+1 ). This is precisely the definition of the zth spherical intrinsic volume Vi{Uf). □ 

3.2.2 The spherical kinematic formula 

We conclude our background discussion with the following remarkable formula for the probability that 
a randomly oriented cone strikes a fixed cone. In view of Lemma 2.3, this result is fundamental to our 
understanding of the probability that the convex deconvolution method (1.2) succeeds under the random 
basis model. 

Fact 3.5 (Spherical kinematic formula [SW08, p. 261]). LetK and K be closed convex cones in R d , at least 
one of which is not a subspace, and let Qbea random basis. Then 

d-l d-l 

P{KnQK?{0}}= E( 1 + ^ 1 ) fc )Z Vi(K)-v d + i+k {.K). (3.2) 

k=0 i-k 

Remark 3.6. Fact 3.5 is usually stated for random rotations drawn according to the Haar measure on 
the special orthogonal group SO^, but the unitary invariance given by Fact 3.2.3 readily implies that the 
spherical kinematic formula holds for Q drawn from the Haar measure on the orthogonal group 0^. 
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Remark 3.7. The spherical Gauss-Bonnet formula (Fact B.2 in Appendix B. 1) can be used to eliminate the 
apparent asymmetry between K and K in (3.2). In particular, the identity 

P>{Kn QK jt {0}} = P{QK nK? {0}} 

holds for any convex cones K and K. 

3.3 High- dimensional decay of spherical intrinsic volumes 

The spherical kinematic formula (3.2), coupled with the geometric optimality conditions of Lemma 2.3, 
provides an exact expression for the probability that the deconvolution problem (1.2) succeeds. Never- 
theless, the formula involves the spherical intrinsic volumes of two cones. It is challenging to determine 
these quantities directly from the definition except in simple situations. 

To confront this challenge, we seek summary statistics for the intrinsic volumes. From Figure 6, we see 
that the spherical intrinsic volumes of the orthant Vi{U d ) decay rapidly outside of the region near K a |. 
When the intrinsic volumes are very small, we may neglect them in the sum (3.2) with very little loss in the 
probability estimate. We exploit this behavior with decay thresholds that describe which intrinsic volumes 
are exponentially small in the ambient dimension d. 

Definition 3.8 (Decay threshold). Let 9 c l\l be an infinite set of indices, and suppose {_KT (d) : d £ 9} is an 
ensemble of closed convex cones with K [d) c R d for each de9. We say that 0* e [0, 1] is an upper decay 
threshold for {K {d) } if, for every > 0*, there exists an e > such that, for all sufficiently large d e 9, we 
have 

Vi{K (d) )<e~ £d for every i>\9d]. (3.3) 

On the other hand, we say that jc* e [0, 1] is a lower decay threshold for {K^} if, for every k < jc*, there 
exists an e > such that, for all d e 9 sufficiently large, we have 

Vi{K {d) ) < e~ ed for every i < \xd] . 

We extend these definitions to non-closed cones by taking the closure. We say 0* is an upper decay thresh- 
old for {K^ } if and only if it is an upper decay threshold for {K {d) } . Similarly, k* is a lower decay threshold 
for {K {d) \ if and only if it is a lower decay threshold for {K {d) }. 

When it will not cause confusion, we omit the index set 9s. 
3.3.1 Examples of decay thresholds 

Fact 3.2.2 implies that not every intrinsic volume is exponentially small, so the inequality < 0* always 
holds. However, there is nothing that precludes the case of equality, as the following two results demon- 
strate. 

Proposition 3.9 (Upper decay threshold for subspaces). Let 9 c l\l be an infinite set of indices, and let 
{L {d) : d e 9} be an ensemble of linear subspaces with L trf) c U d for each d e 9. Suppose there exists a 
parameter o e [0,1] such that dim(L (rf) ) = \ad] . Then 0* - a is an upper decay threshold for the ensemble 
{L {d) }, andx* - a is a lower decay threshold for the ensemble {L {d] }. 

Proof. Let n = n (d) = dim(L (rf) ). By Proposition 3.3, we have Vi{L^) = Si tn -\. Then for any > a, we have 
\8d] > \od~\ for all large enough d e 9, so that 

y i -(L (rf) ) = 0<e- d (3.4) 

for all i > \8d] and d sufficiently large. By definition, 0* — a is an upper decay threshold for the ensemble 
of subspaces {L (d) : d £ 9}. The demonstration that jc* = o is a lower decay threshold for {L (d) : d e 9} 
follows in the same way. □ 
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Normalized intrinsic volume of the nonnegative orthant 
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Figure 7 Normalized spherical intrinsic volumes of the orthant. The graph illustrates the computation in Propo- 
sition 3.10. The normalized spherical intrinsic volumes d _1 log( l>i (R+)) are plotted against the normalized index 
9 = (i + l)ld for several values of dimension d. The solid curve is the uniform limit _f/(0)-log(2) ofthese rescaled 
volumes, where H{9) is the natural entropy (3.6). The smallest possible upper decay threshold is the rightmost 
point where the solid curve crosses the zero level line (dotted), and the largest possible lower decay threshold is 
the leftmost point where the zero level line crosses the solid curve. The solid curve takes its unique maximum 
value of zero at 9 = j so that 0* = jy is an upper decay threshold and jc* = ^ is a lower decay threshold for the 
ensemble {R rf : d = 1,2, ...} of nonnegative orthants. 



Proposition 3.10 (Upper decay threshold for the nonnegative orthant). Let @ cN bean infinite index set. 
The value 0* = \ is an upper decay threshold for the ensemble {R+ : d e @} of nonnegative orthants, and 
jc* = | is a lower decay threshold for {R+ :de@|. 

The computations required for the proof are illustrated in Figure 7. 

Proof. It is well known (see, e.g., [Don06b, Eq. (3.4)]) that, for any e [0, 1], we have 



1 



;log 



d 



JOd] 



H{9) uniformly in 6 as d — «■ oo, 



where 



mm -.= -0io g (0) - (i - e) iogd - 0) 



(3.5) 



(3.6) 



is the natural entropy; be aware that the logarithms are base-e rather than the customary base-2 used in 
information theory. Basic calculus shows that H{8) achieves its unique maximum at 0* = |, where it has 
maximum value H[j) — log(2). 

Let > \ . By continuity of H, there is an e> such that H(9) < log(2) - e for all > > 0* . Continuity 
of the exponential and the uniform convergence in (3.5) together imply that, for all sufficiently large de<2), 
and any i > \6d], we have 



; exp(-dlog(2) + d(log(2) - e)) = e" 
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The left-hand side is equal to the spherical intrinsic volume by Proposition 3.4, so we see that 

8-k = \ is an upper decay threshold for {IR+ : d e &}. 

The proof that k* = \ is a lower decay threshold for : d e &} follows along similar lines. Briefly, 
for any jc < \, there is an e > such that H{k) < log(2) - e for every k < k. For the same reasons as before, 
when d is large enough, we have 

< exp(-dlog(2) + d(log(2) - e)) = e~ ed 

for all i < \xd] . We conclude that k * = \ is a lower decay threshold for the ensemble of nonnegative 
orthantsfRf \deQi}. □ 

We discuss other approaches for finding decay thresholds in Section 5. 




4 Success and failure 

This section synthesizes the material from Sections 2 and 3 to determine whether the convex deconvo- 
lution method (1.2) succeeds, or fails, with high probability. Section 4.1 introduces the concept of a de- 
convolution ensemble. Our main results arrive in Section 4.2, where we find that the success and failure 
of the convex deconvolution method (1.2) are characterized by decay thresholds. Section 4.3 extends our 
methods to achieve uniform guarantees on the success of method (1.2). 

4. 1 Ensembles of deconvolution problems 

A deconvolution ensemble is a collection of deconvolution problems that is indexed by the ambient di- 
mension of the observation. We explain this idea in the context of MCA, and we develop the abstract 
definition in Section 4.1.2. 

4.1.1 Example: The MCA deconvolution ensemble 

Recall from Section 1.1 that MCA seeks to deconvolve a superposition of two sparse vectors. Let us fix 
sparsity levels t x and T y in [0, 1] . For each pair {r x , T y ), we construct an ensemble of deconvolution prob- 
lems with one problem per dimension. For each deN, let x^ and y^ be vectors in K d with 

nnz{x { Q d) ) - ]T x d] and nnz(j^ d) ) = \r y d\. 

In other words, the sparsity of each vector is proportional to the ambient dimension. Draw a random basis 
Q (d) from 0^. We observe the vector z^' = + Q (rf Vo • To set up a convex deconvolution method, we 
need to introduce appropriate complexity measures for xf ] and yf ] . For both vectors, the £\ norm on 
Kr is the natural choice. Assume we have access to the side information 

Together, these data define a deconvolution ensemble for MCA. We want to study when the MCA prob- 
lem (1.2) succeeds with high probability for all members of the ensemble with d sufficiently large. 

4.1.2 Abstract deconvolution ensembles 

It is straightforward to extend this idea to other deconvolution problems. Let ® c N be an infinite set of 
indices. A deconvolution ensemble consists of one problem per index. For each de<3, the data are 

• Vectors x [ d) ,y { Q d) eR d , 

• A random basis Q ( ' e Od that is statistically independent of the other ensemble data, 

• The observation z {d) = x (d) + Q {d) y l d) e R d , 
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• Complexity measures f {d) and g (d) denned on R d , and 

• The side information a (d) = g ld) (y {d) ). 

Where it does not cause confusion, we often omit the superscript d. Given such a deconvolution ensem- 
ble, we seek to determine conditions for which the convex deconvolution method 

minimize f {x) 

(4.1) 

subject to g {d) (y) < cc {d) and x + Q [d) y- z {d) , 
succeeds with high probability when the dimension d is large. 

4.1.3 Success and failure with overwhelming probability 

The following definition lets us describe when (4.1) succeeds, or fails, when the dimension d is large. 

Definition 4.1 (Overwhelming probability in high dimensions). Given a deconvolution ensemble as in 
Section 4.1.2, we say that (4.1) succeeds with overwhelming probability in high dimensions if there exists 
an e > such that, for every sufficiently large dimension deQ), program (4.1) succeeds with probability 
at least 1 - e~ £d over the randomness in Q (d) . Similarly we say that (4.1) fails with overwhelming prob- 
ability in high dimensions if there exists an e > such that, for every sufficiently large dimension de<3), 
program (4.1) succeeds with probability at most e~ ed . 

Remark 4.2. The definition above also provides a guarantee that holds for every dimension d. Suppose, 
for example, that program (4.1) succeeds with overwhelming probability in high dimensions. Then there 
exist constants e > and do^O such that, for all d > d , the failure probability of (4.1) is less than e~ £d . 
Since the failure probability is bounded above by one, it follows that 

P{Method (4.1) fails} < Ce~ Ed for every d e S>, 

where C = e ed °. 

4.2 The main results 

We are now in a position to state our main results. The first result shows that the upper decay threshold 
provides guarantees for the success of deconvolution under the model of Section 4.1. 

Theorem 4.3 (Success of deconvolution). Consider a deconvolution ensemble as in Section 4.1.2. Sup- 
pose the ensembles {&{f {d) ,x [d) ) : d e @} and {&{g {d) ,y { Q ] ) : d e @} of feasible cones have upper decay 
thresholds 9 x and8 y . If 

9 x + 9y<l, 

then program (4.1) succeeds with overwhelming probability in high dimensions. 

The proof of this result appears in Section 4.2.3. This next result shows that the failure of deconvolu- 
tion is characterized by the lower decay threshold. 

Theorem 4.4 (Failure of deconvolution). Consider a deconvolution ensemble as in Section 4.1.2. Suppose 
the ensembles {^(f^ d \x^) : d e 3>} and {^(g (d) , y^) : deQi} of feasible cones have lower decay thresh- 
olds k x andKy. If 

K X + Ky>l, 

then program (4.1) fails with overwhelming probability in high dimensions. 

The proof of this second result is similar in spirit to the proof of Theorem 4.3, but it requires additional 
technical finesse; we defer the details to Appendix B.l. The complementary nature of these two results 
has striking implications. 
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4.2.1 Consequences for the choice of complexity functions 

Theorems 4.3 and 4.4 show that the success, or failure, of the convex deconvolution method (4.1) in high 
dimensions depends only on the sum of the decay thresholds. As a consequence, the upper and lower 
decay thresholds assess the quality of the complexity measures f [d) and g (d) in high dimensions. 

Since the decay thresholds are independent of any interrelationship between the structured vectors 
and y^ in the superimposed observation z^ d \ the quality of the complexity measure is inde- 
pendent of the choice g (rf) . This explains, for instance, the ubiquity of the use of the £\ norm for inducing 
sparsity and the Schatten 1-norm as a complexity measure for rank. Simply put, when a complexity mea- 
sure is good for one incoherent deconvolution problem, it is good for another. 

4.2.2 Sharp phase transitions and an open conjecture 

The complementary nature of Theorems 4.3 and 4.4 has an important consequence for phase transitions 
in the convex deconvolution method (4.1). Equality between the lower and upper decay thresholds ap- 
pears to hold in the cases where we have access to exact formulas for the spherical intrinsic volumes. 
Proposition 3.9 shows that the upper and lower decay thresholds are equal for subspaces whose dimen- 
sion is proportional to the ambient space, and Proposition 3.10 implies that the upper decay threshold 
is equal to the lower decay threshold for the ensemble of nonnegative orthants. Moreover, our compu- 
tations in Appendix C suggest that equality between the upper and lower decay thresholds also holds for 
the ensemble of feasible cones of the ( \ norm at vectors with a fixed proportion of nonzero elements. 

The equality of the upper and lower decay thresholds explains the close agreement between our the- 
oretical bounds and the empirical experiments. For instance, consider Figure 1. For sparsity levels below 
the green curve, the sum of the upper decay thresholds is less than one, so Theorem 4.3 implies that de- 
convolution succeeds with overwhelming probability in high dimensions. On the other hand, with spar- 
sity levels above the green curve, the sum of the lower decay thresholds exceeds one, so deconvolution 
fails with overwhelming probability in high dimensions by Theorem 4.4. (See Section 6.1 for the details of 
this calculation.) 

One may wonder whether the transition between success and failure is sharp in general. By Theo- 
rems 4.3 and 4.4, this question is closely related to the question of when the lower decay threshold is 
equal to the upper decay threshold. The question is related to an open conjecture in spherical integral 
geometry. 

Amelunxen conjectures that the sequence of spherical intrinsic volumes [Amell, Conj. 4.4.16] is log- 
concave, that is, v?{K) > v,-i (K) Vj+i (K) for each i = 0,...,d-2. Roughly speaking, the log-concavity 
conjecture would imply that a sufficiently uniform limit of d~ l \o%{v\g d -\{K {d) )) as d —■ oo is a concave 
function of 8 with a maximal value of zero. If, in addition, this maximum is attained at a unique point, 
then the lower decay threshold must equal the upper decay threshold. This unique maximum property 
holds for the ensemble of orthants {R d :deN], and it holds numerically for the ensemble of feasible cones 
of the £ i norm at vectors with a fixed proportion of nonzero entries. (See Figure 12 in the Appendix.) If 
this is a generic properly of spherical intrinsic volumes, then we could conclude that there is always a 
sharp transition between the success and failure of deconvolution. 

4.2.3 Proof of Theorem 4.3 

The proof of Theorem 4.3 follows readily from a geometric statement concerning the probability that a 
random cone strikes a fixed cone as the dimension d becomes large. 

Theorem 4.5. Suppose Qs is an infinite set of indices, and let {K [d ^ c U d : d e &} and {K^ c R d : d e ®} be 
two ensembles of closed convex cones with upper decay thresholds 0* and 0*. If 0* + 0* < 1, then there 
exists a constant e > such that P{K {d) n QK {d) # {0}} < e~ ed for all sufficiently large de@). 

Before proceeding to the proof, let us explain how Theorem 4.3 follows from Theorem 4.5 and the 
geometric optimality conditions of Lemma 2.3. 
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Vi(K) 



Vi(K) 



v k (K) 



k = d — i — 1 



v k {K) 



k = d — i — 1 



Figure 8 Main idea behind the proof of Theorem 4.5. When the spherical intrinsic volumes are very small for 
large i, the product of the outer (top) and inner (bottom) terms in equation (4.2) is always small. In the left 
panel, the large value of Vj [K] is offset by the small inner sum Y.k=d-i-l v k TO ■ On the right, the small value of 
Vf(K) counteracts the large sum Y.k=d-i-\ "fcTO- This situation is guaranteed to occur when the upper decay 
thresholds satisfy 6* + 6+ < 1 and the ambient dimension d is large. 



Proof of Theorem 4.3 from Theorem 4.5. The assumptions in Theorem 4.3 imply that the ensembles 
\&{f {d) ,x {d) ) :de@i} and {-^(g (d) , y {d) ) :de@} of closed cones satisfy the hypothesis of Theorem 4.5. 
Thus, there is an e > for which 

W(f d \x {d) )f]{-Q¥{g id \/ d) )) = {0} 

except with probability e~ ed , for all sufficiently large d. 

Since cones are contained in their closure, the two feasible cones have a trivial intersection at least 
as frequently as their closures (but see the remark below). Applying our geometric optimality condition, 
Lemma 2.3, immediately implies that (4.1) succeeds with probability at least 1 - e~ ed for every sufficiently 
large d. □ 

Remark 4.6. In fact, the probability that randomly oriented convex cones strike is equal to the proba- 
bility that their closures strike. This seemingly innocuous claim appears to have no simple proof from 
first principles. However, this fact readily follows from the discussion of touching probabilities in [SW08, 
pp. 258-259]. 

Figure 8 illustrates the main idea behind the following proof. 

Proof of Theorem 4.5. The spherical kinematic formula (3.2) only applies when at least one cone is not 
a subspace, so we split the demonstration into two cases. First, we assume that at least one of the en- 
sembles {K^} or {K^} of cones does not contain any subspace. Then, we consider the case where both 
ensembles of cones contain only subspaces. The argument readily extends to the general case by con- 
sidering subsequences where one of the two cases above holds. Through the proof, we drop the explicit 
dependence of and on the dimension d for notational clarity. 

We start with the first case: assume that either K or K is not a subspace. Our main tool is the spherical 
kinematic formula, Fact 3.5. The positivity of the spherical intrinsic volumes and the bound (1 + (- 1) k ) < 2 
imply that the probability P of interest satisfies 

d-lrf-l 

P:=P{KnQK^{0}} < 2£ £ Vi (K) ■ v d + i+k {K) 

fc=0 i=k 

d-1 d-1 
= 2j>i(lQ £ v k {K). (4.2) 

i=0 k=d-i-\ 
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The equality follows by a change in the order of summation and a change of the summation index. Since 
8* + 0* < 1, there exist parameters 8 > 0* and > 0* for which 8 + 8 < 1. We expand the right-hand sum 
of (4.2) into four terms, say Zi, Z 2 , £3, and Z 4 : 



j \8d~\ \8d\ d-l \0d] 

-P<£vdK) £ v k {K) + £ Vi {K] Y, v k {K) 
Z i=0 k=d-i-\ i=\8d]+l k=d-i-\ 

> » ' > » ' 

rerfi d-i d-\ d-\ 

+ Y / v i {K) £ y fc (£) + £ i/, (JO £ v fc (iC). (4.3) 
«'=o fc=[edi+i i=rerfi+i fc=[edi+i 



-:23 =:2 4 

We bound each summand separately. First, the fact 8 + 6 < 1 implies that, for all sufficiently large d, we 
have \8d\ + \6d] < d - 1. We conclude that T.\ = when d is large enough: The inner sum is empty. 
To bound S 2 , apply Facts 3.2.1 and 3.2.2 to the inner sum to find 

d-i 

Z 2 < E Vi{K) < [d-l)e~ e d , 

i=\8d]+l 

where the second inequality holds for some e' > and all sufficiently large d owing to the definition of 
the upper decay threshold 0*. Through analogous reasoning, the definition of 0* gives the exponential 
bounds 

I, 3 <(d-l)e- £ " d and Z 4 < (d- l)e" £ "' rf , 

for some e" , e'" > and all sufficiently large d. Taking £ sufficiently small (say £-\ min{£', e" , e'"}) and d 
sufficiently large gives the result for the first case. 

For the second case, suppose that both K and K are both subspaces. Set n := dim(JT) and h :- dimCRT). 
Choose parameters 8 > 0* and > 0* such that + < 1. The definition of the upper decay threshold and 
Proposition 3.3 imply that 

Vi(K)=6 i , n - 1 <e- £ ' d 

for some e' > 0, all sufficiently large d, and every i > \6d\. In particular, this inequality implies n-1 < ]8d] 
for all d large enough. Similarly, we find n - 1 < ]8d] for all d sufficiently large. 
Since 8 + 8 < 1, we have 

n+fi< \6d] + \8d]+2<d 

whenever d is large enough. That is, the sum of the dimensions of the subspaces is less than the ambient 
dimension. Since two randomly oriented subspaces are almost surely in general position, we see that 
Kn QK = {0} with probability one, whenever d is large enough. This completes the second case, so we are 
done. □ 



4.3 Uniform deconvolution guarantees 

Suppose that Q is drawn at random and fixed. We now consider the probability that (1.2) will deconvolve 
every structured pair (x ,y ) from the associated observation Zq- x + Qy . By the geometric optimality 
condition of Lemma 2.3, this probability is equal to the probability of the event 

&(f,x )0{-Q&{g,y )) = {0} for every structured pair (Xo,y ). (4.4) 

We approach this problem by taking a union bound over the feasible cones; the rest of the details follow 
the proof of Theorem 4.5. This section is technical, and it may be skipped on a first reading. 

We require an extension of the upper decay threshold that provides detailed information on the rate 
of decay of spherical intrinsic volumes. 
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Computing decay thresholds for the nonnegative orthant 




9: Normalized index (0 = il d) 

Figure 9 Computation of decay thresholds for the orthant. The solid curve is the limiting behavior of the spher- 
ical intrinsic volumes of the orthant as in Figure 7. The rightmost point where the horizontal line at level 
-yr crosses this curve is an upper decay threshold at level yr. From the diagram, we see that 0* = h is an upper 
decay threshold at level zero, while 0* as 0.72 is an upper decay threshold at level yr = 0.1. 



Definition 4.7 (Decay at level y/). Let @ c N be an infinite set of indices, let {icT (rf) : d e S>} be an ensemble 
of closed convex cones with e U d , and suppose yi > 0. We say that 0* is an upper decay threshold at 
level yi for the ensemble {K^ } if , for every 8 > 0* , there exists a constant e> such that, for all sufficiently 
large d, the inequality 

Vi(K [d) )<e- d[v+e) 

holds for all i > \8d] . When no level is specified, we take yr — for compatibility with Definition 3.8. As in 
Definition 3.8, this definition extends to non-closed cones by taking the closure. 

4.3.1 Two examples 

Subspaces and orthants again provide useful examples. 

Proposition 4.8 (Decay at level yi for an ensemble of subspaces). Let {L (<i) } be an infinite ensemble of 
linear subspaces with L td) c U d and dim(L (d) ) = \ad] , and suppose yr>0. Then 8* = a is an upper decay 
threshold for {L^} at level yr. 

Proof. The proof is substantially similar to the proof of Proposition 3.9. Let n = dim(L trf) ), so that 
ViiLpv) = Si >n _i by Proposition 3.3. For any 8 > a, we have \6d] > \o~d] for all large enough d. There- 
fore, 

^•a (rf) ) = o< e - ( ^ +1)rf 

for all i > \8d] and d sufficiently large. By definition, 8+ = a is an upper decay threshold at level yi for 

{L id) }. ' □ 
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The computation for the orthant is only slightly more involved. The following result is illustrated in 
Figure 9. 

Proposition 4.9 (Decay at level yr for {U d }). Suppose < yr < log(2), and let @ c N be an infinite set of 
indices. Define 

K rf {yr) := sup {6 : H(6) > log(2) - yr}, 

where H{8) is the entropy defined in (3.6). Then Rd {yr) is an upper decay threshold at level yr for the en- 
semble {W^} oforthants. 

Proof sketch. Let 8 > 6 K d{yf). As in the proof of Proposition 3.10, there exists an e > 0, such that, for all d 
sufficiently large, we have 

ViWt) < exp(d(H(0 v ) -log(2) + £)) = exp(-<i(^ + £)) 

for every i > \9d] . By definition, 8 R d (xfr) is an upper decay threshold at level yr for the ensemble {U d }. □ 

We now extend the decay threshold to cover the family of cones considered in condition (4.4). In our 
applications, the feasible cones appearing in (4.4) are unitarily congruent, so we restrict our attention to 
this case. 

Definition 4.10 (Decay threshold for an ensemble of sets of cones). For an infinite set of indices ® c N, 
let {J£f (d) : d e @} be an ensemble of sets of unitarily congruent cones, indexed by the ambient dimension, 
and let {K {d) ] be an ensemble of exemplars, that is, K {d) e ^ {d) for every de3>. We say that {J$f^} has 
upper decay threshold 8 at level yi if the sequence of exemplars {K^ } has upper decay threshold 8 at level 
yr. 

By Fact 3.2.3, the decay threshold for an ensemble {JT ( of sets of unitarily congruent cones is inde- 
pendent of the choice of exemplars, so this nomenclature is well defined. We now state an analogue to 
Theorem 4.5 that bounds the probability that a number of cones strike, provided there are not too many 
cones. 

Theorem 4.1 1. Let {J£f (d) } and {^ ( '} be two ensembles of sets of unitarily congruent closed convex cones, 
indexed by ambient dimension d. Suppose the cardinality ofJf {d) and JT td) grows no faster than expo- 
nentially: there exist yi, yr such that, for every r\ > and all d sufficiently large, we have the inequalities 
|J*T W) | < e d( V + '", and \,X {d) \ < e d ^+n) . Suppose further {Jf {d) } and {jfr {d) } have respective upper decay 
thresholds 0* and 8*, each at level yr + yr. 

If 0* + 0* < 1, then there exists a constant £ > such that for every sufficiently large d, 

¥^K [d) n QK [d) ? {0} for any K e Jt {d \ Ke J? (d) | < e~ ed , 

where the probability is taken over a random orthogonal Q. 

The proof of Theorem 4.11 simply couples a union bound to the proof of Theorem 4.5, so we defer 
the demonstration to Appendix B.2. Note that the statement of Theorem 4.11 is equivalent to that of 
Theorem 4.5 when <X and J?f are singletons, because we may take yr — yr — in this case. 

Theorem 4.11 can be used to verify that event (4.4) holds with high probability when the dimension 
d becomes large. In Section 6.1.1, we use this approach to verify that the MCA formulation (1.1) can 
deconvolve all sufficiently sparse vectors, and in Section 6.2.2, we use Theorem 4.11 to show that the 
channel coding method (1.4) is robust to adversarial sparse corruptions. 
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5 Computing decay thresholds 



Section 4.2 demonstrates that the decay thresholds provide a simple way to analyze deconvolution under 
the random basis model. This section describes several methods for computing decay thresholds. We 
begin by considering direct approaches, where precise formulas for the spherical intrinsic volumes give 
correspondingly precise thresholds. 

The direct method is powerful, but its application is limited to regimes where we have access to for- 
mulas for spherical intrinsic volumes. In Section 5.2, we observe that known results on linear inverse 
problems [CRPW10] imply bounds on decay thresholds. This observation allows us to study upper decay 
thresholds for several structural classes, including low-rank matrices. 

5.1 Direct approach 

There are several situations where the direct approach for calculating decay thresholds is feasible. Propo- 
sitions 3.9 and 3.10 compute upper and lower decay thresholds for ensembles of subspaces and orthants 
directly from the definition of spherical intrinsic volumes. In Appendix C, we use the asymptotic polytope 
angle computations of [Don06b] to compute the decay threshold for ensembles of feasible cones of the 
£\ norm at sparse vectors. The approach follows roughly the same lines as Propositions 3.10 and 4.9, but 
the argument requires a good deal of background information that is tangential to this work. 

5.2 Relationship to linear inverse problems 

There is a useful link between the number of random linear measurements required to identify a struc- 
tured signal with a convex complexity measure and the upper decay threshold of the associated feasible 
cone. Roughly speaking, the upper decay threshold is the ratio between the number of linear measure- 
ments required to identify a structured signal and the ambient dimension. This observation provides a 
powerful method for determining decay thresholds. 

Let x be a structured vector, and let / be an associated convex complexity measure. Suppose we 
observe the linear image zq = Axq, where A is a known matrix. Chandrasekaran et al. [CRPW10] study the 
convex optimization program 

minimize f{x) 

(5.1) 

subject to Ax-Zq. 

These authors consider the question "Given the data zq = Axq, when is xo the unique optimal point 
of (5.1)?" 

The program (5.1) identifies xq precisely when the null space of A intersects the feasible cone £?(f,xo) 
trivially [CRPW10, Prop. 2.1] . When A is a Gaussian matrix, its null space is a randomly oriented subspace. 
This observation leads to a connection between the decay threshold and the number of observations 
required to recover a vector with (5.1) under a Gaussian measurement model. 

As in Section 4.1, we consider an ensemble of problems indexed by the ambient dimension d. Fix 
an undersampling parameter a e [0, 1]. For each d in some infinite set @ c N of indices, assume we 
are given a structured vector x^ d} e R d , a Gaussian measurement matrix e r^*^ the observation 
z^' = £ U d , and a complexity measure associated with the structure of x^ d \ 

We attempt to identify x {d) by solving the optimization problem 

minimize f trf) (x) 

(5.2) 

subject to QS d) x = zf ) 

with decision variable x e R d . This method succeeds when xq is the unique optimal point of (5.2). The fol- 
lowing result shows that the problem of computing the number of random linear measurements needed 
to identify a structured vector is equivalent to determining an upper decay threshold. 
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Lemma 5.1. Consider the ensemble described above. 

1. Suppose the ensemble {&{f {d) ,x [d) )} of feasible cones has an upper decay threshold 0* < o. 
Then (5.2) succeeds with overwhelming probability in high dimensions. 

2. On the other hand, suppose the linear inverse program (5.2) succeeds with overwhelming probability 
in high dimensions. Then the ensemble \3P (/ (rf) , x id) ) } of feasible cones has an upper decay threshold 

e* = o. 

The proof of Lemma 5.1 appears in Appendix D. 

In Appendix C.3, we describe how our computation of the upper decay threshold for the feasible cone 
of the ( i norm at sparse vectors relates to the recovery guarantees for basis pursuit explored in the series 
of papers [Don04, Don06b, DT09a, DT09b, DTlOb]. Our approach also yields a sharp transition between 
success and failure regimes for basis pursuit. See Appendix C.3. 2 for the details. 

Remark 5.2. A counterpart to Lemma 5.1 that links the failure of linear inverse problems to the lower 
decay threshold k± is readily derivable with the techniques used in this work. This may enable the com- 
putation of lower decay thresholds through information-theoretic arguments [Wai09]. 



5.2.1 The upper decay threshold from the Gaussian width 

Lemma 5.1 provides a powerful tool for computing upper decay thresholds. Define the Gaussian width of 
a cone K c U d by the expression 



W(K"nS d_1 ):=E 



sup (o),x) 

xeKnS d - 1 



where the random vector w is drawn from the Gaussian distribution on R d . The following corollary lets us 
determine upper decay thresholds from Gaussian width bounds. As usual, @ is an infinite subset of the 
natural numbers. 

Corollary 5.3. Consider an ensemble \2P (/ (d) , x {d) ) c U d ; d e @} of feasible cones. Suppose that 

limsup i W(&(f d \x ld) ) n S d_1 ) 2 < 0*. (5.3) 

d— >oo " 

ThenO* isan upper decay threshold (at level zero) for {&(f [d) ,x [d) )}. 

Proof. Let Sl (d) e R nxd be a Gaussian matrix. The result* [CRPW10, Thm. 3.2] states that (5.2) succeeds 
except with probability | e~ is c d whenever the number n of measurements satisfies 

W{&{f (d \x {d) ) n S^" 1 ) 2 < n- c^d, 

where c> is a constant independent of the ambient dimension d. 

Define 6 £ :- 0* + e for some e> 0. By assumption (5.3), there exists a constant c E > such that for all 
sufficiently large de3>,we have 

W{&{f d \x {d) )nS d - l f < \6 E d]-c E y/d. 

Therefore, the linear inverse problem (5.2) succeeds with overwhelming probability in high dimensions 
when the number of measurements satisfies n > \d £ d] . By Lemma 5.1, Part 2, we see that 9 £ is an upper 
decay threshold for & (/, x ). 

The proof is completed verifying that a limit of upper decay thresholds is itself an upper decay thresh- 
old and taking e — ► 0. We omit this straightforward, but technical, argument. □ 



* This result is due essentially to Gordon [Gor87, Gor88] . 
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Table 1 Decay of intrinsic volumes. The decay thresholds developed in this work appear below. 



Level zero Level w 

Cone ensemble 

Upper threshold 0* Lower threshold 0* (y/) 

OrthantslRf \ \ (See Prop. 4.9) 

Feasible cones of £\ norm at (See Appendix C) 

T-sparse vectors 

Subspaces of dimension \od\ a a a 

Feasible cones of the Schat- 6p-5p 2 

ten 1-norm atnxn rank \pn] 

matrices 

Feasible cones of the spectral | 
norm at an orthogonal matri- 
ces 



This machinery allows us to compute upper decay thresholds in situations involving matrix observa- 
tions. For simplicity we consider only the space OS"* " of square n x n matrices, where n e N. The ambient 
dimension of this vector space is d — n 2 , but we will index the observations by the parameter n. This poses 
no difficulty because it is equivalent to indexing over the set <2) = {1 2 ,2 2 , 3 2 , . . . }. 

Proposition 5.4. Fix p e [0, 1] . For each n e N, letX^ e IR" X " be a matrix with rankfXg"') = \pri\ . Then the 
corresponding ensemble {&{\\-\\ Sl ,X^ n) )\ of feasible cones has upper decay thresholdB* = 6p- 5p 2 . 

Proof. Let r = r {n) = rank(X ( 5" ) ). From [CRPW10, Prop. 3.11], we have 

w(^(||-|| Sl ,X^ n] ) n S" 2 " 1 ) 2 < 6rn-5r 2 + 2{n- r) = 6pn 2 - 5p 2 n 2 + 0(ri). 

Dividing both sides by the ambient dimension n 2 and taking limits, we see that the conditions of Corol- 
lary 5.3 hold. This gives the result. □ 

Proposition 5.5. For each n e N, let X™ be an orthogonal matrix. Then the corresponding ensemble 
{•^(11 ■ II op - -Xo"')} of feasible cones has upper decay threshold 0* = | . 

Proof. From [CRPW10, Prop. 3.13], we have 

^(||.|| p,X^)nS" 2 - 1 ) 2 <^^ = ^ 2 + 0(n). 

The result follows upon dividing by n 2 , taking limits, and applying Corollary 5.3. □ 

We list all of the bounds computed in our work in Table 1, but we note that several more bounds are 
readily derivable from the Gaussian width calculations in [CRPW10, Sec. 3.4] . 

6 Applications 

We can tackle a variety of applications using the theory developed in Section 4 and the decay threshold 
calculations of Section 5. 
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Figure 10 Decay threshold for the £\ norm. The left panel shows Q( Y (t), the upper decay threshold for the 
sequence of feasible cones of the £\ norm at \rd] -sparse vectors as a function of the sparsity t. The narrow lines 
show that 8( 1 (t) < \ for t < 0.19, while 8g l (t) < j for t < 0.06. The right panel displays level sets of the function 
8( l itx) + 9f 1 (Ty). The thick curve marks the level set 9/> l (t x ) + Q( x (Ty) = 1; it corresponds to the green curve in 
Figure 1. 



6.1 Morphological component analysis 

We return to the MCA model of Section 1.1 . Our goal is to analyze when we can deconvolve two signals that 
are sparse in incoherent bases. To apply our theoretical results, we consider the deconvolution ensemble 
from Section 4.1.1. 

Fix sparsity levels t x and T y in [0, 1] . For each dimension deN, we construct signals xq and yo in U d 
that satisfy 

nnz(;to) = l^xd] and nnz(yo) = \?yd] . 

Draw a random basis Q, and suppose that we observe zq-xq + Qyo- 

The £\ norm is a natural complexity measure for sparse vectors. Given the side information a — || yo II t\ > 
we pose the constrained MCA problem 

minimize ||x||/. 

(6.1) 

subjectto \\y\\( 1 < a and x+ Qy - zq. 

The theory of Section 4 describes when (6.1) identifies (xo.yo) with overwhelming probability in high 
dimensions in terms of decay thresholds for the ensembles {^(H-H/j ,xq)} and {=^(11-11^ ,yo)} indexed by 
the dimension d. 

Observe that, up to rotations, the geometry of the feasible cone & ( |] • || ( l , w ) depends only on the num- 
ber of nonzero entries in w, but not on the positions or magnitudes of the entries. For a fixed sparsity level 
t e [0, 1], consider an ensemble with u/ d) e R d and nnz(w (d) ) = ]Td] for each deN. In Appendix C, 

we compute the optimal upper decay threshold for the ensemble {^(ll-ll^j , u/ rf) )}. We denote this value 
by 8f 1 (t) (see Figure 10, left panel). Therefore, the ensembles {^(INI^ ,Xq)} and {^(ll-llfj ,yo)} have decay 
thresholds 9( l {t x ) and Q( x (T y ). 

Theorem 4.3 implies that our deconvolution method (6.1) for sparse vectors succeeds with over- 
whelming probability in high dimensions so long as d( l {t x ) + Q( x [Ty) < 1. The right panel of Figure 10 
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shows the level sets of the function 8( l [t x ) + 9( 1 (r y ) for (r x , T y ) on the unit square [0, l] 2 . The green curve 
is the level set 

{{T x ,Ty):Q h {T x ) + B h {T y ) = 1}. 

Theorem 4.3 implies that program (6.1) succeeds with overwhelming probability when the joint sparsity 
(Tx, Ty) lies below the green curve. 

On the other hand, our computations show that the upper decay threshold 0( Y (t) is numerically equal 
to the lower decay threshold (t) — see the discussion in Appendix C.2. Theorem 4.4 implies that the 
sparse deconvolution method (6.1) fails with overwhelming probability for sparsity levels {r x ,T y ) in the 
region above the green curve. In other words, the green curve on the right panel of Figure 10 delineates a 
sharp transition between success and failure for constrained MCA (6.1). The green curve in Figure 10(b) 
is the same as the green curve in Figure 1. 



6. 1 . 1 Strong guarantees 

The theory of Section 4.3 allows us to provide a uniform recovery guarantee. For a fixed draw of the ran- 
dom basis, constrained can MCA deconvolve all sufficiently sparse pairs of vectors with overwhelming 
probability in high dimensions. 

Fix the sparsity {t x , T y ) and the ambient dimension d. Suppose Q e 0^ is a random. By Lemma 2.3, 
constrained MCA can identify every {r x ,T y ) -sparse pair [xo,y ) given the observation z = xq + Qy , pro- 
vided that the event 

& ( II • II t! < *o) fl( - < II • Ik ' yo)) = {»} for each [t x , T y ) -sparse pair {x , y ) (6.2) 

holds. Theorem 4.11 guarantees that the probability of event (6.2) is large when some associated decay 
thresholds are small enough; let us describe how to verify the required technical assumptions. 

First, the results in Appendix C allow us to compute upper decay threshold at levels y/ > for the 
ensemble {^(11-11^ , w {d) )} from Section 6.1 consisting of feasible cones for the (\ norm at T-sparse vectors. 
We extend our earlier notation by writing this quantity as 9( l {T,y/). This is the first element required to 
check the hypotheses of Theorem 4.11. 

We also require information on the total number of feasible cones under consideration. Let 

^x d) = LK^u-ik .*>)}. and <Xy d) = IK-^flHi/i >M> 

where the unions take place over all \T x d] -sparse xq e R d and all \T y d] -sparse yo e There are exactly 
2 k [ d _) different — but unitarily equivalent — feasible cones of the (\ norm at vectors in IR d with k nonzero 
entries, one for each sign/sparsity pattern. This corresponds to the number of [k- 1) -dimensional faces of 
the crosspolytope; see, e.g., [Don06b, Sec. 3.3]. With k = \rd] , it follows from the proof of Proposition 3.10 
that 

d~ l log|2 fc |^JJ - rlog(2) + H(j) =: E(j), (6.3) 

uniformly as d — oo. By continuity of the exponential, for every 77 > and all sufficiently large d, the 
number of feasible cones is bounded above by 



Jr x d] 



d 1 < e d(£(T x )+i7)^ 



for large enough d. Similarly, \^ y d) \ < e d{E(j y )+r >) f or a n sufficiently large d. 

We have now collected enough information to apply our theory. By Theorem 4.11, the event (6.2) holds 
with overwhelming probability in high dimensions so long as 

6 (l [t x , E{t x ) + E{T y )) + 9 £l {Ty, E{t x ) + E{r y )) < 1. (6.4) 
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The blue curve in Figure 1 shows the level set 

{ {Jx, Ty) : 6 tl {t x , E{t x ) + E(T y )) + 6 (l (T y> E{t x ) + E{T y )) = 1 J. 

When the sparsity level [t x , T y ) lies below the blue curve, inequality (6.4) holds, so the deconvolution 
method (6.1) succeeds at deconvolving every pair (xo,yo) of {J x ,T y ) -sparse vectors with overwhelming 
probability in high dimensions. 

6. 1 .2 Numerical experiment for constrained MCA 

The following numerical experiment illustrates the accuracy of these theoretical results. We fix the dimen- 
sion d = 100. For each pair (k x , k y ) e {0, 1, ... , 100} 2 , we repeat the following procedure 25 times. 

1. Draw xo,yo e R d with k x or k y nonzero elements, respectively. The locations of the nonzero ele- 
ments are chosen at random, and these elements are equally likely to be + 1 or - 1 . 

2. Generate a random basis Q e O d ; see Remark 6.1 below. 

3. Solve (6.1) for the optimal point (jc*,y*) with the numerical optimization software CVX [GB08, 
GB10]. 

4. Declare success if ||jc* - Jtoll^ < 10~ 4 . 

The background of Figure 1 shows the results of this experiment as a function of t x = k x ld and T y = k y l d. 
The yellow curve marks the empirical 50% success line, and we note that it tracks the green theoretical 
curve closely. It emerges that d - 100 is already large enough to see the high dimensional behavior de- 
scribed by our theoretical results. 

Remark 6.1. A random basis in Q e Od is often defined through a conceptually simple two-step opera- 
tion. First, draw a square d x d Gaussian matrix; second, orthogonalize the columns of this matrix via the 
Gram-Schmidt procedure. Although this definition has the flavor of a numerical algorithm, the reader 
should be aware that this conceptual process is not numerically stable. Moreover, standard procedures 
for stabilizing the orthogonalization do not preserve the Haar measure. For a straightforward, numerically 
stable approach to generating random bases, see [Mez07] . 

6.2 Secure and robust channel coding 

Next, we study the secure channel coding scheme of Section 1.3.1. We want to analyze when the re- 
ceiver can decode a transmitted message that is subject to a sparse corruption. The difficulty depends 
on the sparsity level t of the corruption, where t e [0, 1]. Let us introduce an ensemble of deconvolution 
problems. In each dimension d e N, choose a d-bit message m e {+\} d and a corruption c e K d with 
nnz(co) = \Td] . Draw a random basis Q known to both the receiver and transmitter. The receiver observes 
zo = Qrrio + cq, the encoded message plus the sparse interference. 

The natural complexity measure for the sparse corruption c is the £\ norm, while the norm is the 
appropriate complexity measure for the sign vector mo. Note that HmoH^ = 1. To recover the original 
message, the receiver solves the problem 

minimize l|c||/, 

(6.5) 

subjectto Hirall^ < 1 and c+ Qm — zq. 

This approach succeeds when (mo, Co) is the unique optimal point of (6.5). 

We consider two types of corruptions. Benign corruptions are taken in any manner that is indepen- 
dent of Q — this would happen, for instance, when the corruption is generated by an adversary with no 
knowledge of Q or the transmission Qmo. Adversarial corruptions are worst-case sparse corruptions; 
these corruptions may model malicious interference or an erasure channel that sets some of the coordi- 
nates of zo to zero. We first consider with the benign corruption, and then consider the adversarial case 
in Section 6.2.2. 
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6.2.1 Benign corruptions 



Since the corruption is chosen independently of the basis Q, the convex deconvolution method (6.5) suc- 
ceeds if it can identify the single pair (Co, nto) . Theorems 4.3 and 4.4 describe where procedure succeeds, or 
fails, with overwhelming probability in high dimensions, but we must first determine some decay thresh- 
olds. 

The feasible cone of the norm at a sign vector is unitarily equivalent to an orthant. From Table 1, 
we see that the sequence : d e N} of orthants has an upper decay threshold of 6 K d = | and a matching 
lower decay threshold K R a = \. Appendix C defines an upper decay threshold 6( 1 (t) for the sequence of 
feasible cones of the (\ norm at T-sparse vectors. 

With these thresholds in hand, Theorem 4.3 guarantees that (6.5) succeeds with overwhelming prob- 
ability in high dimensions provided that 6 Rd + 6( l (t) < 1. Equivalently, we need 6 (l (t) < \ . On the other 
hand, Theorem 4.4 shows that (6.5) fails with overwhelming probability in high dimensions if the corre- 
sponding upper decay threshold satisfies K( Y (t) + k rc i > 1. This condition holds when K£ 1 (t) > \. 

Numerically, we find that 8f 1 (t) < \ when the sparsity level t < 0.193; see the left panel of Figure 10. 
The fact that K{ 1 (t) = B( 1 (t) to numerical precision implies that K( 1 (t) > | for t > 0.193. In other words, if 
fewer than 19% of the entries of cq are nonzero, the scheme (6.5) succeeds with overwhelming probability 
in high dimensions; otherwise, it fails with overwhelming probability in high dimensions. This sharp 
transition corresponds to the location of the dashed line in Figure 3. 

6.2.2 The adversarial case 

In the adversarial case, the corruptions are sparse but may depend on the basis Q and the message mo. 
To ensure that no corruption with |rcf| nonzero entries can cause (6.5) to fail, we must verify that (6.5) 
succeeds at identifying (c , m ) for every \rd] -sparse vector c . From Lemma 2.3, this is equivalent to the 
event 



We can use Theorem 4.11 to verify that event (6.6) holds with overwhelming probability in high dimen- 
sions. Let us collect the additional information required to verify the technical assumptions of this theo- 
rem. 

For ra £ {±l} d , define the singleton Jfe^' = {^(H-ll^ , m )}. The feasible cone J^dl-ll^, , m ) is unitar- 
ily equivalent to the orthant B?+, so the sequence OI sets has an upper decay threshold at level of 
9 R d {if/) defined in Proposition 4.9. 

Define the set .X^ 1 = LK-^OHI^ ,cq)}, where the union occurs over all vectors cq with \rd] nonzero 
elements. By (6.3), the size of Jtf ] is bounded by 



for any r\ > and all sufficiently large d. 

As J^m^ is a singleton, Theorem 4.11 implies that event (6.6) holds with overwhelming probability in 
high dimensions whenever 



Computing 8f 1 (T,y/) and 8 R d (y/) numerically, we find that for all t < 0.0186, inequality (6.7) holds. We 
conclude that our channel coding scheme is robust to all adversarial corruptions so long as the corrup- 
tions have no more than about 1.8% nonzero entries. This computation is illustrated in Figure 11. 



^(11 -11^ , c ) n (-QJ^ (11-11^ , m )) = {0} for every [xrfl -sparse vector c . 



(6.6) 





(6.7) 



31 



Channel coding adversarial bound 




T: Sparsity 



Figure 11 Calculating the adversarial guarantees for channel coding. The lower two curves in the figure corre- 
spond to 8g l (t,E(t)) and 6rJ CE(t)). The upper curve indicates the sum 8# 1 {t,E{t)) + 8k^ (E{t)). For t < 0.018, 
the sum lies below one, so Theorem 4.11 implies that event (6.6) holds with overwhelming probability. 



6.2.3 Numerical experiment 

We perform two numerical experiments to complement our theory, one for the benign corruptions and 
the other for a specific type of malicious erasure. 

For dimensions d - 100 and d = 300 and for each of 70 equally spaced values of t e [0,0.35], we test 
the benign corruption case by repeating the following procedure 200 times: 

1. Draw a binary vector mo E {+ l} d at random. 

2. Choose a corruption cq with k = [id] nonzero elements; the support of cq is random, and the 
nonzero elements are taken to be + 1 with equal probability. 

3. Generate a random basis QeO^; see Remark 6.1. 

4. Solve (6.5) with the observation Zq = QntQ + Cq with the numerical optimization software CVX; call 
(m*, c*) the optimal point. 

5. Declare success if ||m + - Woll^ < 10~ 4 . 

The second experiment incorporates a malicious erasure. As in the benign case, the experiment is run 
for dimensions d = 100 and d = 300 and for 70 equally spaced values of t between zero and one. For each 
of these parameters, we repeat the following 200 times: 

1. Draw a message mo E {+ \} d at random, and generate a random basis QeO^. 

2. Set the observation zq = erase(Qmo, [Td]), where erased, k) sets the k largest-magnitude elements 
of x to zero. 
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3. Solve (6.5) with CVX for the optimal point (m*, c*), and 

4. Declare success if || m*-m 11^ < 10" 4 . 

The curves in Figure 3 show the results of these experiments. For benign corruptions, the theory 
matches the experiments. The empirical 50% success rate occurs very near the predicted sparsity value 
t = 0.193, and the transition region is more narrow for larger d. 

The empirical evidence suggests that our adversarial guarantees are conservative for the type of ma- 
licious corruption used in the experiment. This is expected, as we have no reason to believe that such 
erasures correspond to the worst-case corruption. However, the empirical transition between success 
and failure near t a 0.05 suggests that our adversarial bound lies within a factor of two or three of the best 
possible guarantee. 

6.3 Low-rank matrices under sparse corruptions 

Consider the problem of separating a low- rank square matrix from a corruption that is sparse in a random 
basis. The parameters that determine the difficulty are the proportional rank p e [0, 1] and the sparsity 
level t £ [0,1]. Let us introduce a deconvolution ensemble. For each side length n e N, choose a low- 
rank matrix Xo e R" x " with rank(Xo) = \pri\ and a sparse matrix Yo£R nxn with nnz(Fo) = \Tn 2 ] . Draw a 
random basis SI for R nxn , and suppose we observe Z Q = X + S,[Yq). 

To promote low-rank, we use the Schatten 1-norm, and, to promote sparsity, we use the matrix £\ 
norm. Given the side information a - || Y \\£ 1 , we pose the convex deconvolution method 

minimize ||X||c, 

(6.8) 

subject to || < a and X + £>{Y) = Z . 

We study when (Xo, Yq) is the unique solution to (6.8) with overwhelming probability in high dimensions. 
The feasible cone of the (\ -matrix norm at a matrix Yq e U nxn with k nonzero is isomorphic to the 

2 

feasible cone of the £\ norm at a sparse vector y := vec(F ) e K" with k nonzero entries. It follows that 
the value 6f 1 (t) from (C.14) is an upper decay threshold for the ensemble {^"(IHI^ , Y )\ of feasible cones 
indexed by the ambient dimension d — n 2 of the matrix space R nx 

By Proposition 5.4, we see that &s 1 [p) := 6p - 5p 2 is an upper decay threshold for the ensemble 
{^(ll-llsi .-Xo)} of feasible cones, indexed by the ambient dimension. The green line in Figure 4 is the level 
set 

{[p,T):9 £l [T) + 6 Sl [p) = l\. 

For (p,r) pairs lying below the curve, Theorem 4.3 implies that our deconvolution method (6.8) succeeds 
with overwhelming probability in high dimensions. 

6.3.1 Numerical experiment 

Let us summarize the experiment in Figure 4. The matrix side length n = 35 is fixed, and for each pair 
[p, t) in the set { \ , | , . . . , l} 2 , we repeat the following procedure 25 times: 

1. Draw a matrix Xq = QlA-Qr e IR" X " with rank r - [pn], where A is a diagonal matrix that satisfies 
A;,- = 1 for i = 1, . . . r and A,-; = otherwise and Qi, Qr are independent random bases in O n . 

2. Generate a random matrix Yq e R nxn with [rn 2 ] nonzero entries; the nonzero entries in Yq take the 
values + 1 or - 1 with equal probability. 

3. Generate a random basis 3, for U nxn . 

4. Solve (6.8) with the observation Z = X + £[Y ) with CVX, and set (X*, F*) to the optimal point. 

5. Declare success if || X*-X < 10~ 4 . 

From Figure 4, we see that the theoretical bound appears nearly sharp in the range where t > 0.2, while 
the experiment suggests that our bound can be tightened in the highly sparse regime t — 0. 
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6.4 Assorted matrix deconvolution problems 

We conclude this section with some other combinations of structured square matrices that we can sep- 
arate using the convex deconvolution method (1.2). In each of these applications, we observe a super- 
position of the form Zq - Xo + S2{Yq) e R" x ", where SI is a random basis for the matrix space M nxn . We 
abbreviate the ambient dimension d-n 2 . We consider various structures for Xo and Fo — either low rank, 
orthogonal, sparse, or sign matrices — and we show that our theory quickly identifies a regime where an 
appropriate convex deconvolution method succeeds with overwhelming probability in high dimensions. 

6.4. 1 Orthogonal and sparse matrices 

Fix a sparsity level t in [0, 1]. For each side length n e N, choose an orthogonal matrix Xq eO n and a sparse 
matrix Fo e " with nnz(Fo) = |rrc 2 l - We use the operator norm as a complexity measure for Xo and the 
matrix (\ norm as a complexity measure for F . Since II II op = 1> we P ose tne convex deconvolution 
method 

minimize ||F||/, 

(6.9) 

subject to || X|| o p < 1 and X + Sl{Y) = Z . 

The interchange of the objective and constraint as compared with (1.2) poses no difficulty because the 
optimality conditions of Lemma 2.3 are symmetric with respect to the objective and constraint. 

By Theorem 4.3, the convex deconvolution method (6.9) succeeds with overwhelming probability in 
high dimensions so long as 

0^(T) + Op < 1, 

where 0^(t), defined in Appendix C, is an upper decay threshold for the ensemble {^(Ihll/! < Yq)} and 
0Op = | is an upper decay threshold for the ensemble {^(ll-llop Jo)l by Proposition 5.5. Therefore, pro- 
gram (6.9) succeeds with overwhelming probability in high dimensions whenever 

This occurs for t < 0.06; see the left panel of Figure 10. We conclude that (6.9) deconvolves an orthogonal 
matrix X from a matrix sparse in a random basis SI (F ) with high probability when no more than about 
6% of the elements of F are nonzero. 

6.4.2 Low-rank and sign matrices 

Fix the proportional rank p in [0, 1] . For each side length neN, choose a low- rank matrix Xo e U nx " with 
rank(Xo) = \pri] and a sign matrix Fo £ {±\} nxn . We use the Schatten 1-norm as a complexity measure 
for rank and the matrix 4» norm as a complexity measure for sign matrices. Given that || Yq\\ (xi = 1, we 
consider the convex deconvolution method 

minimize ||X|| S , 

(6.10) 

subjectto |]F|| /oo <l and X + .S(F) = Z . 

We invoke Theorem 4.3 to see that (6.10) succeeds with overwhelming probability in high dimensions 
whenever Sl (p) + d RC i < 1. Here, Sl (p) is an upper decay threshold for the ensemble {.^"(IHIsi >-^o)l of 
feasible cones, and 6 R d is an upper decay threshold for the ensemble of nonnegative orthants. By Propo- 
sition 3.10, we have 0r<* = |, while Proposition 5.4 gives 8s 1 (p) = 6p - 5p 2 . Therefore, the convex decon- 
volution method (6.10) succeeds with overwhelming probability in high dimensions so long as 

6p-5p 2 <^. 

This bound is valid when p < 0.09. We conclude that (6.10) can deconvolve a low- rank matrix from a sign 
matrix in a random basis with overwhelming probability if rank(Xo) < 0.09 n, where n is side length of Xq. 
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6.4.3 Low-rank and orthogonal matrices 



Let p £ [0, 1] be a proportional rank parameter. For each side length n e N, choose a low-rank matrix 
Xo £ U nxn with rank(Xo) = \pri\ and an orthogonal matrix Yq e O n . With the usual choice of complexity 
measures, the convex deconvolution method is 

minimize llXlk, 

(6.11) 

subjectto ||F||op<l and X + &{Y) = Zq. 

By Theorem 4.3, program (6.11) succeeds with overwhelming probability in high dimensions so long as 
'si (p) + #Op < 1. Propositions 5.4 and 5.5 imply that this occurs whenever 

6p-5p 2 <-. 

For instance, it suffices that p < 0.04. Therefore, the convex deconvolution method (6.11) can identify 
a superposition of a low-rank matrix and an orthogonal matrix with overwhelming probability in high 
dimensions so long as rank(X ) < 0.04n, where n is the side length of X . 



7 Prior art and future directions 

This work occupies a unique place in the literature on deconvolution. The analysis is highly general and 
applies to many problems. At the same time, the results are sharp or nearly sharp. This final section offers 
a wide-angle view of the field of deconvolution, from applications to analytical techniques, with a focus 
on methods based on convex optimization. We conclude by discussing some extensions of our current 
approach in the hope of encouraging further development in this field. 

7. 1 A short history of convex deconvolution and incoherence 

The use of convex optimization for signal deconvolution has a long history. Early predecessors to morpho- 
logical component analysis come from the work of Claerbout & Muir [CM73] and Taylor et al. [TBM79], 
where £\ minimization is used to identify sparse spike trains from an observed seismic trace. 

Deconvolution methods based on £ \ minimization were put on a rigorous footing in the 1980s with the 
work of Santosa & Symes [SS86] and Donoho & Stark [DS89] . These results, either implicitly or explicitly, 
rely on incoherence in the form of an uncertainty principle. The work of Donoho & Huo [DH01] formalizes 
the notion of incoherence. Incoherent models, both random and deterministic, now pervade the sparse 
deconvolution literature [SDC03, SED05, ESQD05, HB12, SKPB12]. 

In the last decade, new classes of convex regularizers have been introduced for solving inverse prob- 
lems in signal processing. In particular, the Schatten 1-norm is used for problems involving low- rank ma- 
trices [Faz02, RFP10]. Deconvolution methods that involve the Schatten 1-norm include robust principal 
component analysis [CLMW11, XCSlOa, XCSlOb, MT11] and latent variable selection [CPW10]. Rigorous 
theoretical results for these techniques typically involve some type of spectral incoherence assumption. 

7.2 The neighborhood of this work 

We take much of our inspiration from the geometric analysis of linear inverse problems in [CRPW10]. 
Indeed, the geometric optimality condition (Lemma 2.3) is a direct generalization of a geometric re- 
sult [CRPW10, Prop. 2.1] for linear inverse problems. Moreover, the Gaussian width bounds from that 
work prove useful for computing the decay thresholds in this research. 

A related line of work, due to Negahban et al. [NRWY09, NW10], is based on the concept of restricted 
strong convexity. The results in these papers are sharp within constant factors, but they do not yield 
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bounds as precise as ours. Another general approach to deconvolution appears in of [HB12], where a de- 
terministic incoherence condition leads to recovery guarantees, even in nonconvex settings. The recovery 
bounds available through this method are not competitive with the guarantees we provide. 

Very recent work [PBS12] on the recovery of sparsely corrupted signals offers deconvolution guaran- 
tees for deconvolution when the sparsity is mildly sublinear in the dimension. Their model is similar to 
our MCA formulation in Section 1.1, but the results do not identify the phase transition between success 
and failure. 

7.2.1 Random geometry and convex optimization 

Methods from random geometry have led to many powerful results in convex optimization. We now trace 
one particular line of development that makes the computations in Appendix C possible. 

Vershik & Sporyshev [VS86] use an asymptotic analysis of polytope angles to analyze the average-case 
behavior of the simplex method for linear programming. The underlying formulas have their roots in the 
results of Ruben [Rub60], although some of the ideas apparently go back to the work Schlalfli from the 
mid-nineteenth century — see Ruben's paper for a discussion. 

The analysis of Vershik & Sporyshev fed a line of investigation on the expected face counts of ran- 
domly projected polytopes [AS92, BH99], a topic of theoretical interest in combinatorial geometry. 
These computations resurfaced in convex optimization with the recent line of work of Donoho & Tan- 
ner [Don04, DT05, Don06b, DT09a, DT09b, DTlOa, DTlOb]. These articles present precise results on the 
behavior of convex optimization methods for solving several linear inverse problems under a random 
measurement model. The asymptotic polytope angle calculations of Donoho & Tanner allow us to calcu- 
late the decay thresholds for an ensemble consisting of feasible cones for the £ \ norm at vectors whose 
sparsity is proportional to the ambient dimension. 

This asymptotic polytope angle approach also yields stability guarantees for basis pursuit [XH11]. 
Furthermore, it has been used to establish that iteratively reweighted basis pursuit can provide strictly 
stronger guarantees than standard basis pursuit [XKAH10, KXAH10, KXAH11]. 

Our approach to random geometry differs from these earlier works because it starts with the modern 
theory of spherical integral geometry. Previous research was based on an older theory of polytope an- 
gles [Grii67, Grii68, McM75] . Spherical integral geometry reached its current state of development in the 
dissertation [Gla95, Gla96] . Chapter 6.5 of [SW08] and the notes therein summarize this research. We also 
draw on insights from the thesis [Amel 1] . 

7.3 Conclusions and future directions 

The results in this work demonstrate the power of spherical integral geometry in the context of deconvo- 
lution. Our bounds are often tight, and they are broadly applicable. This approach raises many questions 
worth further attention. We conclude with a list of directions for future work. 

• Tight results for Lagrangian deconvolution. The Lagrange penalized deconvolution method (1.3) is 
important because it requires less knowledge about the unobserved vectors (jco.yo) man the con- 
strained method (1.2). The results in this work give information regarding the potential for, and 
the limits of, the penalized deconvolution approach (1.2). Nevertheless, a precise analysis of the 
penalized problem (1.3) and its dependence on the penalty parameter A would have real practical 
value. 

• Multiple deconvolution. It would be interesting to study deconvolution problems involving more 
than two structured vectors. 

• Spherical intrinsic volumes for more cones. Computation of additional decay thresholds will provide 
new bounds for convex deconvolution methods. The sharpest decay thresholds appear to require 
formulas for spherical intrinsic volumes. For instance, an asymptotic analysis of the spherical in- 
trinsic volumes for feasible cones of the Schatten 1-norm would provide sharp recovery results for 
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low- rank matrix deconvolution problems. Amelunxen has made some recent progress in this direc- 
tion by developing a formula for the spherical intrinsic volumes for the semidefinite cone [Amell, 
Appendix C]. 

• Log-concavity of spherical intrinsic volumes. Amelunxen [Amell, Conj. 4.4.16], and Burgisser & 
Amelunxen [BA10, Conj. 2.19], have conjectured that the sequence of spherical intrinsic volumes is 
log-concave. This conjecture is closely related to the question of whether the upper and lower decay 
thresholds match. In particular, it would then hold that k (i (t) = 6( 1 (t); we have only been able to 
produce numerical evidence of this identity. 

• Extensions to more general probability measures. The analysis in this work focuses on a specific 
random model. It would be interesting to incorporate more general probability measures into our 
framework. This may be a difficult problem; by the results of Section 5.2, this question is closely 
related to the observed universality phenomenon in basis pursuit [DT09b]. 

A Equivalence of the constrained and penalized methods 

This appendix provides a geometric proof of the equivalence between the constrained (1.2) and penal- 
ized (1.3) convex deconvolution methods. The results in this section allow us to interpret our conditions 
for the success of the constrained deconvolution method (1.2) as limits on, and opportunities for, the 
Lagrange deconvolution method (1.3). 

We begin with the following well-known result; it holds without any technical restrictions. We omit 
the demonstration, which is an easy exercise in proof by contradiction. (See also [Roc97, Cor. 28.1.1].) 

Proposition A. 1. Suppose the Lagrange problem (1.3) succeeds for some value A > 0. Then (1.2) succeeds. 

Before stating a partial converse to Proposition A.l, we require a technical definition. We say that a 
proper convex function / is typical at x if / is subdifferentiable at x but does not achieve its minimum at 
x. With this technical condition in place, we have the following complement to Proposition A.l. 

Proposition A.2. Suppose f is typical at Xq andg is typical at y^. If the constrained method (1.2) succeeds, 
then there exists a parameter X > such that the Lagrange method (1.3) succeeds. 

Proof of Proposition A.2. The key idea is the construction of a subgradient that certifies the optimality of 
the pair (Jc ,yo) f° r me Lagrange penalized problem (1.3) for an appropriate choice of parameter A. As 
with many results in convex analysis, a separating hyperplane plays an important role. 

By Lemma 2.3, the constrained problem (1.2) succeeds if and only if &{f,xo) n -U&{g,yo) = {0}. The 
trivial intersection of the feasible cones implies that there exists a hyperplane that separates these cones. 
(This fact is a special case of the Hahn-Banach separation theorem for convex cones due to Klee [Kle55] .) 
In other words, there exists some vector u ^ such that 

(u,x) < OforalljcE J^/.xo), 

and moreover 

(u,y) > for all ye -U^{g,y ). 

In the language of polar cones, the first separation inequality is simply the statement that u e &{f,xo)°, 
while the second inequality is equivalent to U* u e ^{g,yo)°- 

We will now show that u generates a subgradient optimality certificate for the point (xo, yo) in prob- 
lem (1.3) for an appropriate choice of parameter A > 0. We denote the subgradient map by d. 

At this point, we invoke our technical assumption. Since / is typical at xq, the polar to the feasible 
cone is generated by the subgradient of / at Xq [Roc97, Thm 23.7]. In particular, there exists a number 
Xf > such that u e Xfdf{xo). In fact, the stronger inequality Xf > holds because u ^ 0. For the same 
reason, there exists a number X g > such that U* u e X g dg(y ). 
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Define h{x) := Affix) + X g g{U* [zq - x)). By standard transformation rules for subgradients [Roc97, 
Thms. 23.8, 23.9], we have 

dh(xo) ^Afdfixo) - X g Udg(y ), 

where A-B := A+{-B) is the Minkowski sum of the sets A and -B. Since u e Xfdf{xo) and u e X g Udg{y ), 
we see e dh(x ). By the definition of subgradients, Xq is a global minimizer of h. Introducing the variable 
y-U* (zo - x), it follows that {xo, yo) is a global minimizer of 

A 

minimize f[x) + ^g{y) 
subject to x + Uy = zq . 

This is Lagrange problem (1.3) with the parameter A = X g l Xf > 0, so we have the result. □ 

B Regions of failure and uniform guarantees 

We now present the proofs of the results of Section 4 concerning regions of failure (Theorem 4.4) and 
strong deconvolution guarantees (Theorem 4.11) for the convex deconvolution method. These demon- 
strations closely follow the pattern laid down by the proof of Theorem 4.3. 

B. 1 Regions of failure: The proof of Theorem 4.4 

We first state an analog of Theorem 4.5. As usual, @ is an infinite set of indices. 

Theorem B.l. Let {K^ cR^deS) and {_KT (d) c U d : d e &} be two ensembles of closed convex cones with 
lower decay thresholds jc* and ic*. If + ic* > 1, then there exists a constant e > such that P{K td) n 
QK [d) ± {0}} > 1 - e~ £d for all sufficiently large d. 

Theorem 4.4 follows from Theorem B.l in the same way that Theorem 4.3 follows from Theorem 4.5 
with one additional technical point regarding closure conditions. 

Proof of Theorem 4.4 from Theorem B.l. The assumptions in Theorem 4.4 imply that the ensembles 
{&{f [d \x {d) )} and {-&{g {d \y { d) )} of closed cones satisfy the hypotheses of Theorem B.l. Therefore, 
there is a constant e > such that the closure of the feasible cones have wowtrivial intersection with prob- 
ability at least 1 - e~ £d , for all large enough d. 

It follows from Remark 4.6 that the probability of the event & (f [d) , x {d) ) n - {g {d) , y {d) ) # {0} is equal 
to the probability of the event &{f {d) ,x {d) ) n -Q^{g {d) , y {d) ) ? {0}. Applying the geometric optimality 
condition of Lemma 2.3 immediately implies that (1.2) fails with probability at least 1 - e~ £d . □ 

The proof of Theorem B.l requires an additional fact concerning spherical intrinsic volumes. 

Fact B.2 (Spherical Gauss-Bonnet formula [SW08, Page 258]). For any closed convex cone K c U d that is 
not a subspace, we have 

d-\ d-\ y 

£ vdK) = ^ vdK) = -. 
i=-i i=-i z 

i even i odd 

In the proof below, the Gauss-Bonnet formula is crucial for dealing with the parity term (1 + (-l) fc ) 
that arises in the spherical kinematic formula (3.2). 

Proof of Theorem B.l. Since the Gauss-Bonnet formula only applies to cones that are not subspaces, we 
split the demonstration into three cases: neither ensemble {K^} nor {K^} contains a subspace, one en- 
semble consists of subspaces, or both ensembles consist of subspaces. We assume without loss that each 
case holds for every dimension de3>; the proof extends to the general case by considering subsequences 
where only a single case applies. 
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We drop the superscript d for clarity. Assume first that neither K nor K is a subspace. Let G){k) - 
(1 + (-l) fc ) be the parity term in the spherical kinematic formula (3.2). Changing the order of summation 
in the spherical kinematic formula, we find 

d-l d-l 

P:=P{tfnQtf?qO}}= £>j(X) E ®{k-d+\ + i)v k {K). 

i=0 k=d-i-l 

Let k < K"* and k < k* with k + k> 1; such scalars exist because k* + jc* > 1. By positivity of the spherical 
intrinsic volumes (Fact 3.2.1), we have 

d-l d-l 

P> £ Vi {K] £ fi>(Jfc-d + l + fli/ fc (fi (B.l) 

/=fKdl+l fc=d-i'-l 

We will see that the inner sum above is very close to one. Indeed, 

d-l d-l 

£ o(k-d + l + i)v k (K)= £ ffl(fc-rf + l + i)i/jfc(-K)-li = l-li, (B.2) 
k-d-i-l k--\ 

where |,- is a discrepancy term (see (B.3) below). The second equality follows by the spherical Gauss- 
Bonnet formula (Fact B.2), and the assumption that K is not a subspace. 

We now bound the discrepancy term uniformly over i > \xd] + l. Since jc + k > l,foranyz> \xd]+l 
we have d-2-i< \kd] . By definition of the lower decay threshold, we see that the discrepancy term must 
be small: for any i > ]xd] + 1, 

d-2-i \xd] 

h= E 0(fc-d + l + i)w/fc(^)<2 £ y fc (iQ<2(d-l)e- £ ' d , (B.3) 
fc=-i fc=-i 

for some e' > and all sufficiently large d. Applying (B.2) and (B.3) to (B.l), we find 



d-l I d-l \ 

P> £ Vi (K){l-2(d-l)e- e ' d )> X v iW 
i=\xd\+l \i=\Kd'\+l 



■2{d-l)e- £ ' d , (B.4) 



where the second inequality follows from Fact 3.2: the spherical intrinsic volumes are positive and sum to 
one. Wenowreindexthe sum on the right-hand side of (B.4) over i = -1,0,..., d-l with only exponentially 
small loss: 

d-l d-l 
i=\Kd]+l i=-l 

where the discrepancy £ satisfies 

fxd] 

f = X v k [K)< (d-l)e- £ " d , 
i=-l 

for some e" > and all sufficiently large d by definition of the lower decay threshold. Applying these 
observations to (B.4), we deduce that 

P> d f vdK)-(d-l)(e- £ ' d + e- £ " d )>Z vdK)-e- £d 

i=-l ' i=-l 

for some e > and all sufficiently large d. Since X f=l\ v i (K) = 1 by Fact 3.2.2, this shows the result when 
both K and K are not subspaces, completing the first case. 

For the second case, suppose that only one of the cones is a subspace. Without loss, we may assume 
K is the subspace by the symmetry of the spherical kinematic formula (see Remark 3.7). Denote the di- 
mension of the subspace K by n := dim(^), and take parameters k and k as above. 
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By Proposition 3.3, the spherical intrinsic volumes of K are given by Vi(K) = Si^-i- Inserting this 
Kronecker 8 into the spherical kinematic formula (3.2) and simplifying the resulting expression, we find 
the probability of interest is given by 

d-l 

P:=P{KnQK^{0}}= £ G){k+h-d)v k {K). 

k=d-n 

Reindexing the sum over Jfc = -l,0,... ) d-l,we see 

d-l d-n-l d-n-l 

P= £ 6>{k+h-d)v k {K)- £ <D(k+n-d)Vi(K) = l- £ tO(fc+ n- d)Vi(K) (B.5) 
k=-l i=-l i=-l 

where the second equality holds by the spherical Gauss-Bonnet formula (Fact B.2) . 

We now show that h is relatively large. The definition of the lower decay threshold implies that there 
exists an e' > such that 

Vi [K) = 8i,n-i < e~ £ ' d for all i < \kd] 

when d is sufficiently large. This inequality cannot hold if n - 1 < \kd] , so we deduce that n>kd for all 
sufficiently large d. 

Since n>kd and k + k > 1, we must have d-h-1 < \xd] for all sufficiently large d. Applying the 
definition of the lower decay threshold, we find the sum on the right-hand side of (B.5) is exponentially 
small: there exists an e" > such that 

d-n-l n 

£ a{k+n-d)Vi(K)<2{d-l)e~ e d 
i=-l 

for all sufficiently large d. The result for the second case follows immediately. 

Finally, we consider the case when both of the cones are subspaces. Suppose K has dimension n, 
while K has dimension ft, and let ?c, k be as above. As in the second case, we find that n>xd and n>kd 
when d is sufficiently large. Then the inequality jc* + ic* > 1 implies that n + n> d, that is, the sum of the 
dimensions of the subspaces is larger than the ambient dimension. A standard fact from linear algebra 
implies Kn QK ^ {0} for any unitary Q — in other words, for all d large enough, the probability of nontrivial 
intersection is one. This completes the third case, and we are done. □ 

B.2 Proof of the strong guarantees of Theorem 4.11 

Proof of Theorem 4.11. For clarity, we drop the superscript d in this proof. We begin with the union bound: 
the probability of interest P is bounded above by 

P:=P{KnQK^{0} forany KeJC, K e Jt\ < \JC\ ■ \JC\ ■ ¥>{Kn QK ji {0}} . (B.6) 

From here, the proof closely parallels the proof of Theorem 4.5, so we compress the demonstration. We 
consider two cases, one where at least one cone is a subspace, the other where neither cone is a subspace; 
the result extends to the mixed case by considering subsequences. 

Suppose first that at least one cone is a subspace. Let 8 > 0* and 8 > 0* with 8 + 6 < 1. We bound the 
probability on the right-hand side of (B.6) by 

^P{KnQK? {0}} < Zi + Z 2 + Z 3 + Z 4 , 

where the 2 ; are given in (4.3). The fact that 8 + 8 < 1 implies that Zi = for sufficiently large d, as in the 
proof of Theorem 4.5. Since 8 > 0*, the definition of the upper decay threshold at level if/ + y/ implies 

Z 2 < £ ViiK) <{d- i) e - rf( V + V'+£'') 
i=\6d\+\ 
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for some e' > and all sufficiently large d. With analogous reasoning, we find similar exponential bounds 
for Z3 and 2 4 : 

Z 3 < id-l)e- d ^ +E "\ Z 4 < (d-De"** 4 *^ 

again for positive e" ,e"' and all sufficiently large d. Summing these inequalities and taking d sufficiently 
large gives 

for some e > 0. The claim then follows from our exponential upper bound on the growth of and | JET | 
with 77 = 172: 

P <1-\,X\-\,X\- e d(V+f+el2)-d( V +f+e) = ^-{d _ 

Taking E-elA and d sufficiently large gives the claim in the first case. 

Now consider the case where both cones are subspaces, and let n :- dim(JT) and h := dimCRT). Take 
parameters 9 > 0*, and 9 > 9+, such that 9 + 9 < 1. As in the proof of Theorem 4.5, the Kronecker 5 
expression for the intrinsic volumes of the subspaces K and K given by Proposition 3.3, combined with 
the definition of the upper decay threshold, reveals that n < \9d] and h < ]9d] for all sufficiently large d. 
The fact that 9 + 9 < 1 implies n + h < d for all sufficiently large d. Since randomly oriented subspaces are 
almost always in general position, the probability that Kn QK ^ {0} is zero. This is the second case, so we 
are done. □ 



C Decay thresholds for feasible cones of the £\ norm 

This section describes how we compute decay thresholds for the feasible cone of the £ \ at sparse vectors. 
The polytope angle calculations appearing in [Don06b] form an important part of this computation. For 
convenient comparisons, Table 2 provides a map between our notation and that of the reference. 

Fix a sparsity parameter t e [0, 1], and let @ be an infinite set of indices. For each dimension de@,we 
define a vector x {d) e U d such that nnz(x (d) ) = \Td] . The following results describes the behavior of the 
spherical intrinsic volumes of the feasible cone & ( || • || e x , x [d) ) in terms of the sparsity t and the normalized 
index 9 - i/d when d is large. 

Lemma C.l. Consider the ensemble above. There exists a function ^totai sucn that, for every e > and all 
sufficiently large de3>,we have 

^logf^^ai-ll^.x^jjJsTtoMte.Tj + e (C.l) 

for all 9 e [t, 1], and 

vied] (^(IHI*, * (d) )) = (C.2) 

for 9 e [0,t). 

We discuss the definition and computation of the normalized exponent total in Section C.l. This 
function provides decay thresholds for the ensemble {^"(11-11/! : d e S?} in the same way that the 
limit (3.5) provides decay thresholds for the ensemble of orthants. See Section C.2 for details. 

Proof of Lemma C.l. We leave the dependence on the dimension implicit for clarity. Define k :- \id~\ . 

We first show that (C.l) holds. The proof relies on an expression for spherical intrinsic volumes in 
terms of polytope angles. For a face F of a polytope P, we define f}{F,P) as the internal angle of P at F 
and j{F, P) as the external angle of P at F (see [Grii67, Chapter 14] for the definitions). The following is an 
important alternative characterization of the spherical intrinsic volumes in terms of these angles. 

Fact C.2 ([SW08, Equation (6.50)]). LetK be a polyhedral cone, andlet$i(K) be the set of alii -dimensional 
faces of K. Then 

Vi {K)= £ P(0,F)r(F,K). (C.3) 
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Table 2 Translation between this work and [Don06b]. Note that in the reference, ^net is defined for three argu- 
ments, but only depends on two parameters, namely v and pS. 





uur notation 


iNOLallOn OI [IJOnUoDJ 


Sparsity ratio 


T 


pS 


Ratio of measurements to 


(7 


s 


ambient dimension 






Undersampling ratio 


t/ct 


p 


Normalized index 





V 


Internal exponent 






External exponent 




^ext(v,p5) 


Total exponent 




V net {v, P ,8)-y face (p8) 


Net exponent 







We now specialize Fact C. 2 to the case where ^(||- 1| ^ ,x). Define the sublevel set S:= {if : || wll^j < 
llxll/j}. Recalling that our assumption t > implies ||jc||^j > 0, we have 

&{\\-\\ ei ,x) = cone(S- {x}) = cone(C- {x/ \\x\\ ei }), (C.4) 

where C := {w : || M/H^ < 1} is the standard crosspolytope. The fact that x is fc-sparse is equivalent to the 
statement that x I WxW^ lies in the relative interior of a (A; - 1) -dimensional face of the crosspolytope C. 

Relationship (C.4) implies that there is a one-to-one correspondence between the /-dimensional faces 
of ,x) and the /-dimensional faces of the crosspolytope C that contain xl WxW^. Since the inter- 

nal and external angles only depend on the local structure of a given polytope, we find that for every 
nonempty face F of &{\\ • || £ l , x) the internal and external angles satisfy 

0(0, F) = 0(x,F) and y(F,K) = j(F,C), 

where F is the face of the crosspolytope C naturally corresponding to the face F of the feasible cone 
&(\l\\e„x). 

A number of important relationships due to Boroczky & Henk [BH99] for faces of the crosspolytope C 
are conveniently collected in [Don06b, Section 3.3]. In particular, we will need the following two facts: 

1. There are 2 i-fc+2 (/^ 2 ) faces of C of dimension (i+l) > (fc-1) containing a given (fc-1) -dimensional 
face of C, and 

2. The high degree of symmetry of the crosspolytope ensures that the internal and external angles at 
these faces depend only on the dimensional parameters fc and i. 

Applying the observations above to equation (C.3), we find 



i-k+2 



1 d-k 
i-k + 2 



PlT k - 1 ,T t+1 yrlFt + i,C) (C.5) 



for z = {k - 1, .. ., d- 1}. Above, is any (/ + 1) -dimensional face of the crosspolytope C, and Tj is the 
j -dimensional regular simplex. 

The internal and external angles in (C.5) have explicit expressions due to [BH99] and the work of 
Ruben [Rub60]. Donoho [Don06b] conducts an asymptotic investigation of these formulas. To distill 
the essence of the analysis, Donoho gives continuous functions ^int^.T) and ^extC^) such that, for any 
e > and all sufficiently large d, the inequalities 



^io&PiT^.Tt+i)) < -^int(|.T) + | (C.6) 

1 . , . _ , , ; , £ 



rf log(7(F, + i,C)) < -T' ext (^ + - (C.7) 
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hold uniformly over i = { \jd\ ,...,d-l}. Moreover, it follows from Equation (3.5) that for sufficiently large 
d, we have 



d 

where the exponent ^Ccont for the number of confaining faces is defined by 

^cont {0, t) := (0 - t) log(2) + (1 - 0) H f ^ ) . (C.8) 

The function H{8) is the entropy defined by (3.6). Equation (C.l) follows by defining 

^totaiCe.T) := Y cont (0,T) - ¥ int (0,T) - ¥ ext (0) (C.9) 

and taking logarithms in (C.5). This is the first claim. 

We now show that, for any < t, Equation (C.2) holds for all sufficiently large d. Since xl \\x\\f 1 lies in 
a (k- 1) -dimensional face of the crosspolytope C, every face of ^(H-H/j ,x) has dimension at least (fc- 1). 
It follows immediately from Definition 3.1 that 

for all i < k—1. Since k - \rd~\ , we see that (C.2) holds for all sufficiendy large d so long as < t. This is 
the second claim. □ 



C. 1 Computing the exponents 

We now define the functions needed to compute W tota i in (C.9). Recall that T'cont is explicitly defined 
in (C.8). The functions ^int and ^ext are defined in [Don06b], but we recapitulate their definitions for 
completeness. Define implicit parameters x = x{6) and s = s(0, t) as the solutions to the equations 

2xG{x) 1-0 

W = -T' (C10) 

M(5) = \- T -, (C.ll) 
where G{x) = ^ /„* exp(- t 2 )dt is the error function erf(x) and M{s) is a variant of the Mills' ratio given by 

M{s) = -se s2 ' 2 f e~ t2l2 dt = -sJ- erfcxf- 
J-oo V 2 V 



5 

'7^ 



2 

where erfcx(s) = e s erfc(s) is the scaled complementary error function. This second form for M(s) is con- 
venient for numerical computations. It follows from [Don06b] that x and s are well defined. Numerical 
evaluation of x{8) and s{6,r) is straightforward using, for example, bisection methods. 
With these parameters in hand, the exponent for the internal angle is given by f 

/ S0 \ TS 2 

Yint(0,T):=(6/-T)log v^— J-— , (C.12) 

where 5 = s{8,t) satisfies equation (C.ll). The exponent for the external angle is 

¥ ex t(6>) := -d-0)log(G(x)) + 0x 2 , (C.13) 

where x = x(0) is given by (CIO) above. 

Figure 12 displays T'totait-.T) for a few values of the parameter t. Empirically, it appears that ^totaiO.T) 
is concave for every value of t e [0, 1] and has a unique maximal value of zero. 



^Equation (C.12) requires a significant amount of wholly uninteresting algebraic simplification from the formulas of [Don06b]. 
The key steps in this simplification follow from the equations on page 638 of the reference. In particular, we write y explicitly in terms 
of s with their Eq. (6.12), and then write f explicitly in terms of 5 using this expression for y — see equation (6.13) in the reference. 
Noting that the reference defines y = ^ on page 631 gives (C.12), modulo trivial simplifications. 
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Normalized spherical intrinsic volume of the 
feasible cone of the (\ norm at sparse vectors 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

8: Normalized index (9 = il d) 

Figure 12 Upper bounds for the exponent of the spherical intrinsic volumes. We plot *F tota i(T, •) for several 
different values of t. The best upper decay threshold at level y/ is given by the rightmost 6 for which the curve 
intersects the horizontal line at -y/. The short dashes show the upper decay threshold Q( x (0.1,0), while the 
long dashes show 6/> 1 (0.1, 0.1), defined in Equation (C.14). For each t, the upper decay threshold at level zero is 
numerically equal to the lower decay threshold. 



C.2 Defining decay thresholds 

The exponent T'totai provides decay thresholds for the (\ norm at proportionally sparse vectors. We define 

tl (t, yr) ■■= inf{0* e [0, f ] : TW^t) < -yr for all 6 e (0*, 1]}. (C.14) 

In words, 8( 1 (t, yr) is the rightmost point of intersection of the curve ¥ total (•> t) with the horizontal line at 
the level -yr (see Figure 12). Further define 

x h (t) := su P {k* e [0, 1] : ¥ to tal(0. t)< for all k e [0, jc*)}. (C.15) 

This function (t) is the leftmost point of intersection of ^totaiC". T ) with the horizontal line at level zero. 
Equations (C.14) and (C.15) define decay thresholds for the ensemble of feasible cones for the (\ norm at 
proportionally sparse vectors. 

Proposition C.3 (Decay thresholds for the (\ norm at proportionally sparse vectors). Consider the en- 
semble of Lemma CI. The function 0^(7,1//) is an upper decay threshold at level yr for the ensemble 
{^dHI/i i^o^) ' ^ E while K( 1 (t) is a lower decay threshold for the ensemble ,x$fi) :de@i}. 

Proof. By definition, ¥ total (0> T ) < -yr for every > Q( x {T,yr). It then follows immediately from Lemma C.l 
and Definition 4.7 that0^ {j,yr) is an upper decay threshold at level yr for the ensemble {^(INI^ ,x td) ) : d e 
@}. The proof that K£ l (t) is a lower decay threshold for {^(ll-ll/j , jc 1 ') : d e @} is equally straightforward, 
so we omit the argument. □ 
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We abbreviate the upper decay threshold at level zero by 

e ei {.T):=e ei {T,0) (C.16) 

for consistency with Definition 3.8. 

Figure 12 illustrates the definition of 9e 1 {T,y/). Numerically, it appears that zero is the unique maximal 
value of ^totai (t, •) for every value t e (0, 1] . If this is indeed the case, then we would be able to deduce that 
6f 1 (t) = Kf 1 (t) for all values of sparsity t. 

C.3 Reconciliation with [Don06b] 

We now discuss the relationship between our spherical intrinsic volume approach and the bounds 
of [Don06b, DT05] for basis pursuit. Numerically, it appears that the two approaches provide equivalent 
success guarantees, but the expressions for the exponents seem to preclude a direct proof of equivalence. 
We also describe how our approach gives matching upper bounds for region of success of basis pursuit, 
which shows that our results are the best possible up to numerical accuracy. 

C.3. 1 Reconciliation with the weak threshold 

Recall that basis pursuit is the linear inverse problem (5.1) with objective /(•) = \\-\\e 1 . By the first part 
of Lemma 5.1, basis pursuit with a Gaussian measurement matrix 11 e $ ad \* d succeeds at recovering 
a k = \rd] -sparse vector with overwhelming probability in high dimensions, so long as the pair (t,ct) 
satisfies 6f 1 (t) < a. 

We now describe the analogous result given in [Don06b, Sec. 7.1]. Define the critical sparsity ratio 
(compare with the critical proportion [Don06b, Def. 2]) 

t w ((t) = sup{f e [0,(j] : ¥ t otai(0.T) < o for all e [a, 1]}. (C.17) 

Then the result [Don06b, Thm. 2] is equivalent to the statement that basis pursuit with a Gaussian matrix 
fi e x d succeeds with overwhelming probability in high dimensions whenever the pair (t, a] satisfies 

T < TwW. 

These two approaches show strong similarities, and methods provide the same results to numerical 
precision. Indeed, under the assumption that both r^fff) and Q( x (t) are monotonically increasing func- 
tions (this appears to hold empirically), one can show that these approaches are equivalent. Rather than 
dwell on this fine detail, we present a matching failure region for basis pursuit. 

C.3. 2 Matching upper bound 

The following result shows links the lower decay threshold to regions where basis pursuit fails. 

Proposition C.4. Suppose K( 1 {t)> a. Then basis pursuit with n - \ad] Gaussian measurements fails with 
overwhelming probability in high dimensions for the t -sparse ensemble of Lemma C.l. 

Since the function ^totalO.^) has a unique maximal value of zero up to our ability to compute the 
functions involved (see Figure 12), we have the equality k# 1 (t) = (t) to numerical precision. Coupling 
Proposition C.4 with our discussion in Section C.3.1 reveals that basis pursuit with a Gaussian measure- 
ment matrix exhibits a phase transition between success and failure at cr = 6^ (t). 

Proof of Proposition C.4. Let {x [d) e U d } be an ensemble of T-sparse vectors as in Lemma C.l. The null 
space of an n x d Gaussian matrix is distributed as QL, where L is a linear subspace of dimension {d-n). It 
then follows from [CRPW10, Prop. 2.1] that basis pursuit with n = J ad] Gaussian measurements succeeds 
with the same probability that QLn J^GI-II^ ,x ) - {0}. 

By Proposition 3.9, the value 7C* = (1 - a) is a lower decay threshold for L. Our assumption implies 
Kf 1 (t) + > 1, so by Theorem B.l, we see QLn^{\\-\\f 1 ,x ) ^ {0} with overwhelming probability in high 
dimensions. We conclude that basis pursuit fails with overwhelming probability in high dimensions. □ 
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D Proof of Lemma 5. 1 



This section provides the proof of the two claims of Lemma 5.1 concerning the relationship between the 
upper decay threshold and the linear inverse problem (5.1). The first result is a corollary of [CRPW10, 
Prop. 2.1]. We drop the superscript d for clarity. 

Proof of Lemma 5.1, Parti. Let il be the n*d Gaussian measurement matrix, where n = \ad] . The null 
space of SI is distributed as QL, where Q is a random basis, and Lis a {d- n) -dimensional subspace of U d . 
Therefore, the probability that (5.1) succeeds is equal to the probability that QLn^{f,x ) ^ {0} [CRPW10, 
Prop. 2.1]. By Proposition 3.9, the subspace L has an upper decay threshold 9l = l-cr, so that the assump- 
tion 0* < a implies that 9± + 9 L < 1. The claim follows from Theorem 4.5. □ 

The second claim of Lemma 5.1 requires additional effort. We begin with the following technical 
lemma. Below, @ is an infinite set of indices. 

Lemma D.l. Let {K {d) : d e @} be an ensemble of closed convex cones with K {d) c R d for each d, and let 
{L {d) :d£@>} bean ensemble of linear subspaces of R d of dimension d - ]ad] . If there exists ane>0 such 
that for every sufficiently large d, 

P{K ld> n QL {d) ? {0}} < e~ £d , 

then {K {d) } has an upper decay threshold 9+ = a. 

Again, the spherical kinematic formula (3.2) is at the heart of the proof. 

Proof of Lemma D.l. We split the argument into two cases: first, we consider the case where K {d) is not a 
subspace, and then consider the case where _RT (d) is a subspace. The general mixed-cone case follows by 
applying these arguments to the subsequences consisting of only one type of cone. 

We drop the superscript d for clarity. Let n = \ad] . We first assume that K is not a subspace. By the 
spherical kinematic formula, the probability of interest P is given by 

d-l d-l 

P:=P{KnQL?{0}}= £(l + (-l) fc )£ vdDva-i-i+kiK). 

k=0 i=k 

By Proposition 3.3, y, (L) = so the probability above reduces to 

d-i 

p=£(i+(-D fc -> t ro. (D.i) 

k=n 

By assumption, we have P < e~ ed for all sufficiently large d, so the positivity of spherical intrinsic volumes 
(Fact 3.2.1) implies 

Vk(K) < e~ £d , for any k > n such that k = n mod 2. 

It requires an additional geometric observation to remove the dependence on parity. Let L be a [d - 
n - 1) -dimensional subspace contained in L. By containment, it is immediate that 

P :-P{KnQL^ {0}} < P{KnQL^ {0}} < e~ ed , 

where the last inequality is by assumption. But the same manipulations as before show 

d-i 

P= ^ [l + {-l) k - n - l )v k {K) 

k=n+l 

so we have Vk (K) < e~ ed for every k > n + 1 such that k = n + 1 mod 2. In summary, for every d sufficiently 
large and any k> n- \o~d\, we have vic(K) < e~ ed . By definition, a is an upper decay threshold for the 
ensemble of cones K. This completes the first case. 
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Now suppose that K is a subspace and define m := dim(JT). Since dim(L) = d - \ad] , we have 



P:=V>{KnQL^{0}} = 



{ 



0, m < \ad] 

1, otherwise 



because randomly oriented subspaces are almost always in general position. The assumption that P < 
e -ed f or a jj su f nc i en tly large d requires that m < \ad] for all sufficiently large d. By Proposition 3.9, the 
scalar a is an upper decay threshold for subspace K. This is the result for the second case, so we are 



Proof of Lemma 5.1, Part 2. The results of [CRPW10] imply that the linear inverse problem (5.1) with a 
Gaussian measurement matrix fl succeeds with the same probability that a randomly oriented {n- tri- 
dimensional subspace QL strikes the feasible cone ^(/,jc ) trivially. The result then follows from 
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